ElasticSearch - Average aggregation/sort over multivalued non-unique numeric fields -
i trying handle sorting on average of multivalued field called 'rating_average'. in example i'm giving you, values field [1, 2, 2]. i'm expecting average (1+2+2)/3 = 1.66666667. reality i'm getting 1.5 average.
after few tests , analyzing extended stats, i've discovered happens because average calculated on non-unique items. statistical operators applied on set [1, 2] instead of [1, 2, 2]. i've proved end adding aggregations section query double check average calculated sort block identical 1 in stats aggregation.
an example document following:
{ "_source": { "content_uri": "http://data.semint.co.uk/resource/testcontent1", "rating_average": [ "1", "2", "2" ], "fordesk": "http://data.semint.co.uk/resource/kmfmjd1rtkd" }
the query i'm performing following:
{ "from": 0, "size": 20, "aggs": { "rating_stats": { "extended_stats": { "field": "rating_average" } } }, "query": { "filtered": { "filter": { "bool": { "must": [ { "terms": { "mediatype": [ "http://data.semint.co.uk/resource/testmediatype3" ], "execution": "and" } } ] } } } }, "fields": [ "content_uri", "rating_average"], "sort": [ { "rating_average": { "order": "desc", "mode": "avg" } } ] }
and these results executing query on document aforementioned.
{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 1, "max_score": null, "hits": [ { "_index": "travel_content6", "_type": "semantic-index", "_id": "http://data.semint.co.uk/resource/testcontent1", "_score": null, "fields": { "content_uri": [ "http://data.semint.co.uk/resource/testcontent1" ], "rating_average": [1, 2, 2] }, "sort": [ 1.5 ] } ] }, "aggregations": { "rating_stats": { "count": 2, "min": 1, "max": 2, "avg": 1.5, "sum": 3, "sum_of_squares": 5, "variance": 0.25, "std_deviation": 0.5, "std_deviation_bounds": { "upper": 2.5, "lower": 0.5 } } } }
Comments
Post a Comment