How to implement "lowercase" in an ElasticSearch query?

Question

Welcome To Ask or Share your Answers For Others

How to implement "lowercase" in an ElasticSearch query?

asked Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

How to implement "lowercase" in an ElasticSearch query?

I am trying to implement lowercase functionality in ElasticSearch. According to their API docs here

https://www.elastic.co/guide/en/elasticsearch/reference/current/lowercase-processor.html

you use this code snippet

{
  "lowercase": {
   "field": "foo"
  }
}

in the query to get a lowercase value contained in the required field.

They do not have a specific example and I keep getting an error when I execute a search query.

This is what I tried:

 POST /users/_search
 {
  "size" : 10,
  "_source" : {
   "includes" : [
    "userid",
    "username" 
   ]
  },
  "query" : {
      "query_string" : {
          "query" : "*John*",
           "lowercase": { "default_field" : "username.keyword"}
      }
   },
 "sort" : [
   {
     "_doc" : {
      "order" : "desc"
    }
  }
 ]
}

In the above query I try to find a username 'john' (converted 'John' to lowercase).

Error message is as follows:

{
  "error" : {
   "root_cause" : [
    {
     "type" : "parsing_exception",
     "reason" : "[query_string] unknown token [START_OBJECT] after 
   [lowercase]",
     "line" : 18,
     "col" : 27
     }
   ],
  "type" : "parsing_exception",
  "reason" : "[query_string] unknown token [START_OBJECT] after 
[lowercase]",
  "line" : 18,
  "col" : 27
 },
  "status" : 400
 }

The same query works (although it does not give me the result that I need) if I replace

 "lowercase": { "default_field" : "username.keyword"}

with

  "default_field" : "username.keyword"

Any suggestions about how I can fix this query? Thanks!

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-19T03:40:09+0000

The processor you're referencing is part of an ingest pipeline -- enabling you to transform your data before it gets ingested. It has nothing to do with querying data.

You can simulate an ingest pipeline like so:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "lowercase": {
          "field": "username"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "username": "John"
      }
    }
  ]
}

Check this answer to see pipelines in action.

As you ingest textual data into Elasticsearch, it gets analyzed and tokenized. The default analyzer is the standard analyzer and you can see how it'd tokenize the word "John" when you run:

GET _analyze
{
  "text": "John",
  "analyzer": "standard"
}

As you can see, it auto-lowercases any input text. This means that when you ingest a single doc into a new index called indexname:

POST indexname/_doc
{
  "username": "John"
}

you can then search lowercase tokens straightaway:

GET indexname/_search
{
  "query": {
    "query_string": {
      "default_field": "username",
      "query": "john*"
    }
  }
}

As a matter of fact, you don't even need the wildcard * at the end.

Categories

How to implement "lowercase" in an ElasticSearch query?

How to implement "lowercase" in an ElasticSearch query?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags