I've just ventured into the seemingly simple but extremely complex world of searching. For an application, I am required to build a search mechanism for searching users by their names.
After reading numerous posts and articles including:
How can I use Lucene for personal name (first name, last name) search?
http://dublincore.org/documents/1998/02/03/name-representation/
what's the best way to search a social network by prioritizing a users relationships first?
http://www.gossamer-threads.com/lists/lucene/java-user/120417
Lucene Index and Query Design Question - Searching People
Lucene Fuzzy Search for customer names and partial address
... and a few others I cannot find at-the-moment. And getting at-least indexing and basic search working in my machine I have devised the following scheme for user searching:
1) Have a first, second and third name field and index those with Solr
2) Use edismax as the requestParser for multi column searching
3) Use a combination of normalization filters such as: transliteration, latin-to-ascii convesrion, etc.
4) Finally use fuzzy search
Evidently, being very new to this I am unsure if the above is the best way to do it and would like to hear from experienced users who have a better idea than me in this field.
I need to be able to match names in the following ways:
1) Accent folding: Jorn matches J?rn and vise versa
2) Alternative spellings: Karl matches Carl and vice versa
3) Shortened representations (I believe I do this with the SynonymFilterFactory): Sue matches Susanne, etc.
4) Levenstein matching: Jonn matches John, etc.
5) Soundex matching: Elin and Ellen
Any guidance, criticisms or comments are very welcome. Please let me know if this is possible ... or perhaps I'm just day-dreaming. :)
EDIT
I must also add that I also have a fullname field in case some people have long names, as an example from one of the posts: Jon Paul or Del Carmen should also match Jon Paul Del Carmen
And since this is a new project, I can modify the schema and architecture any way I see fit so there are very limited restrictions.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…