I want to make fuzzy match for email or telephone by Elasticsearch. For example:
match all emails end with @gmail.com
or
match all telephone startwith 136
.
I know I can use wildcard,
{
"query": {
"wildcard" : {
"email": "*gmail.com"
}
}
}
but the performance is very poor. I tried to use regexp:
{"query": {"regexp": {"email": {"value": "*163.com*"} } } }
But doesn't work.
Is there better way to make it?
curl -XGET localhost:9200/user_data
{
"user_data": {
"aliases": {},
"mappings": {
"user_data": {
"properties": {
"address": {
"type": "string"
},
"age": {
"type": "long"
},
"comment": {
"type": "string"
},
"created_on": {
"type": "date",
"format": "dateOptionalTime"
},
"custom": {
"properties": {
"key": {
"type": "string"
},
"value": {
"type": "string"
}
}
},
"gender": {
"type": "string"
},
"name": {
"type": "string"
},
"qq": {
"type": "string"
},
"tel": {
"type": "string"
},
"updated_on": {
"type": "date",
"format": "dateOptionalTime"
},
}
}
},
"settings": {
"index": {
"creation_date": "1458832279465",
"uuid": "Fbmthc3lR0ya51zCnWidYg",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1070299"
}
}
},
"warmers": {}
}
}
the mapping:
{
"settings": {
"analysis": {
"analyzer": {
"index_phone_analyzer": {
"type": "custom",
"char_filter": [ "digit_only" ],
"tokenizer": "digit_edge_ngram_tokenizer",
"filter": [ "trim" ]
},
"search_phone_analyzer": {
"type": "custom",
"char_filter": [ "digit_only" ],
"tokenizer": "keyword",
"filter": [ "trim" ]
},
"index_email_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "name_ngram_filter", "trim" ]
},
"search_email_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "trim" ]
}
},
"char_filter": {
"digit_only": {
"type": "pattern_replace",
"pattern": "\D+",
"replacement": ""
}
},
"tokenizer": {
"digit_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "3",
"max_gram": "15",
"token_chars": [ "digit" ]
}
},
"filter": {
"name_ngram_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "20"
}
}
}
},
"mappings" : {
"user_data" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "ik"
},
"age" : {
"type" : "integer"
},
"gender": {
"type" : "string"
},
"qq" : {
"type" : "string"
},
"email" : {
"type" : "string",
"analyzer": "index_email_analyzer",
"search_analyzer": "search_email_analyzer"
},
"tel" : {
"type" : "string",
"analyzer": "index_phone_analyzer",
"search_analyzer": "search_phone_analyzer"
},
"address" : {
"type": "string",
"analyzer" : "ik"
},
"comment" : {
"type" : "string",
"analyzer" : "ik"
},
"created_on" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"updated_on" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"custom": {
"type" : "nested",
"properties" : {
"key" : {
"type" : "string"
},
"value" : {
"type" : "string"
}
}
}
}
}
}
}
See Question&Answers more detail:
os