Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

elasticsearch - Elastic search apply boost based on nested field value

Below is my indexed document

 {
  "defaultBoostValue":1.01,
  "boostDetails": [
    {
      "Type": "Type1",
      "value": 1.0001
    },
    {
      "Type": "Type2",
      "value": 1.002
    },
    {
      "Type": "Type3",
      "value": 1.0005
    }
  ]
}

i want to apply boost based on value passed, so suppose i pass Type 1 then boost applied will be 1.0001 and if that Type1 does not exist then it will use defaultBoostValue below is my query which works but quite slow, is there any way to optimize it further

Original question Above query works but is slow as we are using _source

   {
  "query": {
    "function_score": {
      "boost_mode": "multiply",
      "functions": [
          "script_score": {
            "script": {
              "source": """
                double findBoost(Map params_copy) {
                    for (def group : params_copy._source.boostDetails) {
                        if (group['Type'] == params_copy.preferredBoostType ) {
                            return group['value'];
                        }
                    }
                    return params_copy._source['defaultBoostValue'];
                }
                
                return findBoost(params)
              """,
              "params": {
                "preferredBoostType": "Type1"
              }
            }
          }
        }
      ]
    }
  }
}

I have removed the condition of not having dynamic mapping, if changing the structure of boostDetails mapping can help then I am ok but please explain how it can help and be faster to query also please give mapping types and modified structure if answer contains modifying mapping.

question from:https://stackoverflow.com/questions/65850992/elastic-search-apply-boost-based-on-nested-field-value

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using dynamic mappings (lots of fields)

It looks like you adjusted the doc structure compared to your original question.

The query above was thought for nested fields which cannot be easily iterated in a script for performance reasons. Having said that, the above is an even slower workaround which accesses the docs' _source and iterates its contents. But keep in mind that it's not recommended to access the _source in scripts!

If your docs aren't nested anymore, you can access the so-called doc values which are much more optimized for query-time access:

{
  "query": {
    "function_score": {
      ...
      "functions": [
        {
          ...
          "script_score": {
            "script": {
              "lang": "painless",
              "source": """
                try {
                  if (doc['boost.boostType.keyword'].value == params.preferredBoostType) {
                    return doc['boost.boostFactor'].value;
                  } else {
                    throw new Exception();
                  }
                } catch(Exception e) {
                  return doc['fallbackBoostFactor'].value;
                }
              """,
              "params": {
                "preferredBoostType": "Type1"
              }
            }
          }
        }
      ]
    }
  }
}

thus speeding up your function score query.


Alternative using an ordered list of values

Since the nested iteration is slow and dynamic mappings are blowing up your index, you could store your boosts in a standardized ordered list in each document:

"boostValues": [1.0001, 1.002, 1.0005, ..., 1.1]

and keep track of the corresponding boost types' order in the backend where you construct the queries:

var boostTypes = ["Type1", "Type2", "Type3", ..., "TypeN"]

So something like n-hot vectors.

Then, as you construct the Elasticsearch query, you'd look up the array index of the boostValues based on the boostType and pass this array index to the script query from above which'd access the corresponding boostValues doc-value.

This is guaranteed to be faster than _source access. But it's required that you always keep your boostTypes and boostValues in sync -- preferably append-only (as you add new boostTypes, the list grows in one dimension).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...