elasticsearch data increase & duplicate at each restart

Question

Welcome To Ask or Share your Answers For Others

elasticsearch data increase & duplicate at each restart

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

elasticsearch data increase & duplicate at each restart

I'm using elasticsearch with angularjs and oracle on windows 7. it's working more & more finer ( thanks to stackoverflower help ). I have a problem with elasticsearch: the number of elements in my document is increasing and i don't know why/how. My oracle table indexed by elasticsearch contain 12010 elements, now i got 84070 elements in elastic document (frequently checked by curl _count): so it duplicate the data 7 times now. I re-indexed the table few days ago but i remove elasticsearch "data" folder before.

data seems to increase each time i restart windows.

Thanks for help.

This is how i install and index my data :

I do this only the first time :

unzip elastic in folder : D:workelasticsearch-1.3.1
install web interface : >plugin -install mobz/elasticsearch-head
install jdbc : >plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.3.0.0/elasticsearch-river-jdbc-1.3.0.0-plugin.zip
copy "ojdbc6-11.2.0.3.jar" to "D:workelasticsearch-1.3.1pluginsjdbc"
service.bat install
service.bat start

creating index

curl -XPOST 'localhost:9200/donnees'

mapping :

curl -XPUT 'localhost:9200/donnees/specimens/_mapping' -d '{
"specimens" : {
    "_all" : {"enabled" : true},
    "_index" : {"enabled" : true},
    "_id" : {"index": "not_analyzed", "store" : false},
    "properties" : {
        "O_OCCURRENCEID"                                : {"type" : "string",   "store" : "no","index": "not_analyzed"  } ,
            .... 
        "I_INSTITUTIONCODE"                             : {"type" : "string",   "store" : "yes","index": "analyzed" } 
    }
}}'

query oracle and index data :

curl -XPUT 'localhost:9200/_river/donnees_s/_meta' -d '{
 "type" : "jdbc",
 "jdbc" : {
    "index" : "donnees",
    "type" : "specimens",
    "url" : "jdbc:oracle:thin:@localhost:1523:recolnat",
     "user" : "user",
     "password" : "password",
     "sql" : "select * from all_specimens_data"
   }
}'

( is this correct ?? it doesn't work if i replace "curl -XPUT 'localhost:9200/_river/donnees_s/_meta'" by "curl -XPUT 'localhost:9200/donnees/specimens/_meta' which i use to query )

test :

curl -XGET 'http://localhost:9200/donnees/specimens/_count?q=*'
    => 12010
curl -XGET 'http://localhost:9200/donnees/specimens/_search?q=P00009359'
    => return data ok

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:31:48+0000

Resolved thanks to Konstantin V. Salikhov.

Each time elasticsearch service start it query the database with the sql provided to the _river and get the data ( see me previous "query oracle and index data : "). If the data don't have an "_id" column _river can't determine which records it have already loaded and the data is duplicated each time. To avoid duplicate i edit my "all_specimens_data" table in database ( who is in fact a view to avoid modification o database) and rename "O_OCCURRENCEID" to "_id", "O_OCCURRENCEID" is my primary key UUID.

hope this help other

Categories

elasticsearch data increase & duplicate at each restart

elasticsearch data increase & duplicate at each restart

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags