Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
221 views
in Technique[技术] by (71.8m points)

How to implement "lowercase" in an ElasticSearch query?

I am trying to implement lowercase functionality in ElasticSearch. According to their API docs here

https://www.elastic.co/guide/en/elasticsearch/reference/current/lowercase-processor.html

you use this code snippet

{
  "lowercase": {
   "field": "foo"
  }
}

in the query to get a lowercase value contained in the required field.

They do not have a specific example and I keep getting an error when I execute a search query.

This is what I tried:

 POST /users/_search
 {
  "size" : 10,
  "_source" : {
   "includes" : [
    "userid",
    "username" 
   ]
  },
  "query" : {
      "query_string" : {
          "query" : "*John*",
           "lowercase": { "default_field" : "username.keyword"}
      }
   },
 "sort" : [
   {
     "_doc" : {
      "order" : "desc"
    }
  }
 ]
}

In the above query I try to find a username 'john' (converted 'John' to lowercase).

Error message is as follows:

{
  "error" : {
   "root_cause" : [
    {
     "type" : "parsing_exception",
     "reason" : "[query_string] unknown token [START_OBJECT] after 
   [lowercase]",
     "line" : 18,
     "col" : 27
     }
   ],
  "type" : "parsing_exception",
  "reason" : "[query_string] unknown token [START_OBJECT] after 
[lowercase]",
  "line" : 18,
  "col" : 27
 },
  "status" : 400
 }

The same query works (although it does not give me the result that I need) if I replace

 "lowercase": { "default_field" : "username.keyword"}

with

  "default_field" : "username.keyword"

Any suggestions about how I can fix this query? Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The processor you're referencing is part of an ingest pipeline -- enabling you to transform your data before it gets ingested. It has nothing to do with querying data.

You can simulate an ingest pipeline like so:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "lowercase": {
          "field": "username"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "username": "John"
      }
    }
  ]
}

Check this answer to see pipelines in action.


As you ingest textual data into Elasticsearch, it gets analyzed and tokenized. The default analyzer is the standard analyzer and you can see how it'd tokenize the word "John" when you run:

GET _analyze
{
  "text": "John",
  "analyzer": "standard"
}

As you can see, it auto-lowercases any input text. This means that when you ingest a single doc into a new index called indexname:

POST indexname/_doc
{
  "username": "John"
}

you can then search lowercase tokens straightaway:

GET indexname/_search
{
  "query": {
    "query_string": {
      "default_field": "username",
      "query": "john*"
    }
  }
}

As a matter of fact, you don't even need the wildcard * at the end.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...