ElasticSearch - Average aggregation/sort over multivalued non-unique numeric fields -


i trying handle sorting on average of multivalued field called 'rating_average'. in example i'm giving you, values field [1, 2, 2]. i'm expecting average (1+2+2)/3 = 1.66666667. reality i'm getting 1.5 average.

after few tests , analyzing extended stats, i've discovered happens because average calculated on non-unique items. statistical operators applied on set [1, 2] instead of [1, 2, 2]. i've proved end adding aggregations section query double check average calculated sort block identical 1 in stats aggregation.

an example document following:

{   "_source": {   "content_uri": "http://data.semint.co.uk/resource/testcontent1",   "rating_average": [     "1",     "2",     "2"   ],   "fordesk": "http://data.semint.co.uk/resource/kmfmjd1rtkd" } 

the query i'm performing following:

{   "from": 0,   "size": 20,   "aggs": {   "rating_stats": {     "extended_stats": {         "field": "rating_average"       }     }   },   "query": {     "filtered": {       "filter": {         "bool": {           "must": [             {               "terms": {                 "mediatype": [                   "http://data.semint.co.uk/resource/testmediatype3"               ],               "execution": "and"               }             }           ]         }       }     }   },   "fields": [ "content_uri", "rating_average"],   "sort": [     {       "rating_average": {         "order": "desc",         "mode": "avg"       }     }   ] } 

and these results executing query on document aforementioned.

{   "took": 1,   "timed_out": false,   "_shards": {     "total": 1,     "successful": 1,     "failed": 0   },   "hits": {     "total": 1,     "max_score": null,     "hits": [       {         "_index": "travel_content6",         "_type": "semantic-index",         "_id": "http://data.semint.co.uk/resource/testcontent1",         "_score": null,         "fields": {           "content_uri": [             "http://data.semint.co.uk/resource/testcontent1"           ],           "rating_average": [1, 2, 2]         },         "sort": [           1.5         ]       }     ]   },   "aggregations": {     "rating_stats": {       "count": 2,       "min": 1,       "max": 2,       "avg": 1.5,       "sum": 3,       "sum_of_squares": 5,       "variance": 0.25,       "std_deviation": 0.5,       "std_deviation_bounds": {         "upper": 2.5,         "lower": 0.5       }     }   } } 


Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -