Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
579 views
in Technique[技术] by (71.8m points)

Elasticsearch aggregation by max date giving wrong results

I want to group by sysCode by max date that is get the latest documents for each sysCode and then aggregate on employeeId and type field. The below query does not return me desired results, for GER it returns employeeId=1 and for IND it returns employeeId=3 which I do not want.

sample json document

{
  "sysCode": "GER",
  "employeeId": 1,
  "date": "2014-06-14",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "GER",
  "employeeId": 2,
  "date": "2014-06-15",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "IND",
  "employeeId": 3,
  "date": "2014-06-16",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "IND",
  "employeeId": 3,
  "date": "2014-06-16",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "MISSED"
      }
    ]
  }
}

aggregation query

{
  "aggs": {
    "result_by_sys_code": {
      "terms": {
        "field": "sysCode"
      },
      "aggs": {
        "max_as_of_date": {
          "max": {
            "field": "date"
          }
        },
        "employees": {
          "terms": {
            "field": "employeeId"
          },
          "aggs": {
            "nested": {
              "nested": {
                "path": "categories.pb"
              },
              "aggs": {
                "metrics": {
                  "terms": {
                    "field": "categories.pb.type.keyword"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

mappings

{
  "mappings": {
    "properties": {
      "date": {
        "type": "date"
      },
      "categories": {
        "properties": {
          "pb": {
            "type": "nested",
            "properties": {
              "metric": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      },
      "controlCode": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}
question from:https://stackoverflow.com/questions/66059413/elasticsearch-aggregation-by-max-date-giving-wrong-results

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I think using top_hits, you can fulfil your requirement. From the official documentation:

This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "top_tags": {
      "terms": {
        "field": "type",
        "size": 3
      },
      "aggs": {
        "top_sales_hits": {
          "top_hits": {
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [ "date", "price" ]
            },
            "size": 1
          }
        }
      }
    }
  }
}
'

we group the sales by type and per type we show the last sale. For each sale only the date and price fields are being included in the source.

Also, Sampler Aggregations do something similar but in a different way.

A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.

Can refer Limit ElasticSearch aggregation to top n query results for a TL;DR on Sampler.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...