Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
203 views
in Technique[技术] by (71.8m points)

Troubleshooting Elasticsearch returning unstable results. Can I query only a particular node in the cluster?

I am querying Elasticsearch every 30 minutes for some count of events collected in the last 30 minutes. And I am expecting results ranging from 6 to 8. But from time to time Elasticsearch returns 0 which is a false query result. (I am seeing the events in Kibana and there is no deviation from the range 6-8 messages per 30 minutes).

The query ran for months correctly with no false results and 1 week ago it started to return the incorrect results.

I am using Elasticsearch version 7.3.1.

What I have researched so far

Were there any Elasticsearch upgrades/patches around the time it started? The service provider told me there were not.

Was there in increase in Elasticsearch usage (ingestion/queries)? I see no deviation from the last month averages.

Is the status of the cluster green? Yes, the cluster is in green health status.

Are all the indexes having a green status? Yes, all the indexes have the green health status.

Here is my query:

POST /*parsed*/_search?size=0&filter_path=hits.total.value
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "app": {
              "query": "billing-processor"
            }
          }
        },
        {
          "match_phrase": {
            "message": {
              "query": "BillingRecords and finished"
            }
          }
        },
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "now-30m",
              "lte": "now"
            }
          }
        }
      ]
    }
  }
}

The Elasticsearch cluster has 3 nodes (all with all 3 roles:ingest, master, data) and replication factor 2 for index shards.

I came to analyse the Elasticsearch cluster node advanced analytics and I see these suspicious JVM Heap charts for 2 of the nodes:

node instance-0000000000

node instance-0000000002

(the 3rd node has a very smooth chart)

And these are the times when the query returned false results (blue dots) compared to all executions (green bars):

enter image description here

I am suspecting that when the query hits a certain node, the query result is sometimes incorrect while for other nodes the query is always correct.

Question: How can I query only a particular node of Elasticsearch? (the cluster is provided as a service and I am only exposed a load balancer FQDN for the cluster: https://182a33a5c22a4db7840457a555773ed5.elasticsearch.my-service-provider.net). Can I provide hints in the query or request metadata that I want my results delivered by a certain node?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I guess I found the answer in the Elasticsearch documentation: Search shard routing.

The routing query parameter can be provided in the _search requests:

POST /my-index-000001/_doc?routing=my-routing-value
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Still, in the particular issue mentioned in the question, it comes more and more to the surface that the node/index caching might be cause the issue. The search query uses size=0 which is apparently treated in a special way by Elasticsearch. The hints are given in the documentation of the _search request_cache query parameter:

(Optional, Boolean) If true, the caching of search results is enabled for requests where size is 0. See Shard request cache settings. Defaults to index level settings.

and in the Shard request cache settings documentation page:

By default, the requests cache will only cache the results of search requests where size=0, so it will not cache hits, but it will cache hits.total, aggregations, and suggestions.

Most queries that use now (see Date Math) cannot be cached.

Scripted queries that use the API calls which are non-deterministic, such as Math.random() or new Date() are not cached.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...