I have a Solr 8.7.0 installation and I'm using the Data Handler importer plugin via a MySQLi connection.
I have four entities declared:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/hmsscot_bassculture"
user="myuser"
password="mypw"/>
<document>
<entity name="author" query="select id,type,firstname,surname,biographical_info,extrainfo from bassculture_author">
<field column="id" name="id"/>
<field column="type" name="type"/>
<field column="firstname" name="firstname"/>
<field column="surname" name="surname"/>
<field column="biographical_info" name="biographical_info"/>
<field column="extrainfo" name="extrainfo"/>
</entity>
<entity name="source" query="select id,type,short_title,full_title,publisher,author_id,orientation,variants from bassculture_source">
<field column="id" name="id"/>
<field column="type" name="type"/>
<field column="short_title" name="short_title"/>
<field column="full_title" name="full_title"/>
<field column="publisher" name="publisher"/>
<field column="author_id" name="author_id"/>
<entity name="author" query="SELECT s.*, CONCAT(ba.firstname, ' ', ba.surname) AS author FROM bassculture_source s, bassculture_author ba WHERE s.id=${source.id} AND s.author_id = ba.id;">
<field column="author" name="author"/>
</entity>
<field column="description" name="description"/>
<field column="orientation" name="orientation"/>
<field column="variants" name="variants"/>
</entity>
<entity name="copy" query="select id,type,folder,source_id,item_notes,seller,library,shelfmark,pagination,dimensions from bassculture_item">
<field column="id" name="id"/>
<field column="type" name="type"/>
<field column="folder" name="folder"/>
<field column="source_id" name="source_id"/>
<entity name="source_title" query="select id,short_title from bassculture_source where id=${copy.source_id}">
<field column="short_title" name="source_title"/>
</entity>
<entity name="source_author" query="SELECT bt.*, CONCAT(ba.firstname, ' ', ba.surname) AS source_author FROM bassculture_tune bt, bassculture_item c, bassculture_source s, bassculture_author ba WHERE c.id=${copy.id} AND c.source_id = s.id AND s.author_id = ba.id;">
<field column="source_author" name="source_author"/>
</entity>
<field column="item_notes" name="item_notes"/>
<field column="seller" name="seller"/>
<field column="library" name="library"/>
<field column="shelfmark" name="shelfmark"/>
<field column="paginations" name="pagination"/>
<field column="dimensions" name="dimension"/>
</entity>
<entity name="tune" query="select id,type,name,start_page,alternate_spellings,item_id from bassculture_tune">
<field column="id" name="id"/>
<field column="type" name="type"/>
<field column="name" name="name"/>
<entity name="source_title" query="select s.* FROM bassculture_source s, bassculture_item c, bassculture_tune bt where bt.id=${tune.id} AND c.source_id = s.id AND bt.item_id = c.id">
<field column="short_title" name="source_title"/>
</entity>
<entity name="tune_author" query="SELECT bt.*, CONCAT(ba.firstname, ' ', ba.surname, ' ', ba.extrainfo) AS tune_author FROM bassculture_tune bt, bassculture_item c, bassculture_source s, bassculture_author ba WHERE bt.id=${tune.id} AND bt.item_id = c.id AND c.source_id = s.id AND s.author_id = ba.id;">
<field column="tune_author" name="tune_author" />
</entity>
<field column="start_page" name="start_page"/>
<field column="alternate_spellings" name="alternate_spellings"/>
<field column="item_id" name="item_id"/>
</entity>
</document>
</dataConfig>
Now, I'm experiencing something which doesn't make sense to me. If I run the data importer leaving the 'entity' drop-down blank (i.e. import all entities):
I get:
Indexing completed. Added/Updated: 2357 documents. Deleted 0 documents. (Duration: 13s)
This is the correct number of documents (authors+sources+copies+tunes). Nevertheless, when I query the database I only get 1938 documents:
"responseHeader":{
"status":0,
"QTime":103,
"params":{
"q":"*:*",
"_":"1609335106436"}},
"response":{"numFound":1938,"start":0,"numFoundExact":true,"docs":[
{
[...]
This are only the tunes (last entity in the configuration file above). I also see this in the dashboard:
If on the other hand I select the entities one by one (e.g. author etc...):
the plugin imports correctly the author, tune, and copy entities (each time the . query reflects the documents imported). Once I get to the fourth entity though (tune), the index apparently 'forgets' about the previous three entities - although after running it, plugin reports 'documents deleted: 0' - and the . query goes back to only 1938 documents found (i.e. only tunes).
There's no error message in the logs. What am I missing?
PARTIAL SOLUTION
I managed to add a prefix to the id in order to differentiate the four different data, so that unique IDs don't get rewritten, eg:
SELECT name,start_page,alternate_spellings,item_id, CONCAT('tune_', id) AS id, 'tune' as type FROM bassculture_tune;
Nevertheless, I need the database id (without the prefix) of the current tune, in this case, for some later comparison, eg:
<entity name="tune_author" query="SELECT bt.*, CONCAT(ba.firstname, ' ', ba.surname, ' ', ba.extrainfo) AS tune_author FROM bassculture_tune bt, bassculture_item c, bassculture_source s, bassculture_author ba WHERE bt.id=${tune.id} AND bt.item_id = c.id AND c.source_id = s.id AND s.author_id = ba.id;">
<field column="tune_author" name="tune_author" />
</entity>
Since ${tune.id} now has a prefix the whole query doesn't do what I need any more. Is there a way to strip the prefix locally?
Edit 2
The query
<entity name="tune_author" query="select s.* FROM bassculture_source s, bassculture_item c, bassculture_tune bt WHERE bt.id=REPLACE(${tune.id}, 'tune_', '') AND c.source_id = s.id AND bt.item_id = c.id;">
throws an error (unable to execute query) on importing data on Solr.
This is the error in the Solr log:
Caused by: java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT REPLACE(tune_1, 'tune_', ''), AND c.source_id = s.id AND bt.item_id = c.i' at line 1
PS
Something like
select item_id FROM bassculture_tune bt WHERE bt.id= (SELECT REPLACE('tune_1', 'tune_', ''));
on MySQL console works just fine.
Introducing variables
I'm trying my luck with a variable now:
<entity name="this_tune_id" query="SET @this_tune_id = REPLACE('${tune.id}','tune_','');">
</entity>
<entity name="source_title" query="select s.* FROM bassculture_source s, bassculture_item c, bassculture_tune bt WHERE c.source_id = s.id AND bt.item_id = c.id AND bt.id = ${this_tune_id};">
<field column="short_title" name="source_title"/>
</entity>
This gives me a
org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 1
error.
FINAL SOLUTION
I am storing the database ID as this_tune_id, and the Solr id (with the prefix) as id, so that I can use this_tune_id for my queries while still storing a prefixed id in Solr:
<entity name="tune" query="SELECT name,start_page,alternate_spellings,item_id, id AS this_tune_id, CONCAT('tune_', id) AS id, 'tune' as type FROM bassculture_tune;">
<field column="name" name="name"/>
<entity name="source_title" query="select s.* FROM bassculture_source s, bassculture_item c, bassculture_tune bt WHERE c.source_id = s.id AND bt.item_id = c.id AND bt.id = ${tune.this_tune_id};">