Let's say I have a single index called /recipes
The mappings include keywords for fields contributor
and dish_name
(which is always a single word, like "pancakes").
I have multiple recipes and multiple contributors of recipes (docs), but I'm really interested in those from Martha and Shane.
Moreover, I'd like to find out what is the percentage of unique dish names (non-overlapping in all the recipes they've each contributed) contributed by just these two individuals.
E.g., they each could have contributed multiple different recipes that are all for dishes named "pancakes."
I imagine I want to find all the recipes where contributor:Martha
and then further get the count of unique dish names (if Martha has multiple recipes for pancakes, I only want one of those to count). Then I would do the same for Shane. Finally, I need to have a way to compare these results against each other.
In SQL land this sounds like I want a left outer join. In ES, I've tried filter aggregations, terms aggregations, sub-aggregations, pipeline aggregations. However, I can't seem to find just the right combo to get a single query to do what I want.
Example data:
recipes: [
{
_id: 1,
dish_name: pancakes,
contributor: Martha,
ingredients: who cares
},
_id: 2,
dish_name: pancakes
contributor: Shane,
ingredients: still doesn't matter
},
_id: 3,
dish_name: pancakes,
contributor: Martha,
ingredients: totally diff from id 1
},
{
_id: 4,
dish_name: souffle
contributor: Martha,
ingredients: souffle stuff
},
_id: 5,
dish_name: pie,
contributor: Shane,
ingredients: pie stuff
}
]
I would expect that there is a total pool of 4 dish_names: 2 unique dishes contributed by Martha (one pancakes and one souffle), 2 unique dishes contributed by Shane (pancakes and pie). The final outcome would reflect 50% uniqueness from each because they each have contributed one thing the other did (non-unique) and one unique remaining contribution.
Is this doable in a single query? Does it require multiple queries? I'm trying in ES 5.6 btw, so may not necessarily have some more modern options like composite aggregation.
question from:
https://stackoverflow.com/questions/65850968/aggregate-results-of-separate-elasticsearch-queries-with-their-own-aggs