sql - GroupAggregate for Subquery in Redshift/PostgreSQL -


i've noticed strange behavior in query optimizer redshift, , i'm wondering if can explain or point out workaround.

for large group by queries, it's pretty essential optimizer plan groupaggregate rather hashaggregate, doesn't try fit temporary results in memory. works fine me in general. when try use group by subquery, switches hashaggregate.

for example, consider following query.

select install_app_version, user_id, max(platform) plat dailies group install_app_version, user_id; 

the table dailies has sortkeys (install_app_version, user_id) , distkey (user_id). hence groupaggregate possible, , query plan looks this, should.

xn groupaggregate  (cost=0.00..184375.32 rows=1038735 width=51)   ->  xn seq scan on daily_players  (cost=0.00..103873.42 rows=10387342 width=51) 

in contrast, if use above in subquery of other query, hashaggregate. example, simple as

select count(1) (   select install_app_version, user_id, max(platform) plat     daily_players     group install_app_version, user_id ); 

has query plan

xn aggregate  (cost=168794.32..168794.32 rows=1 width=0)   ->  xn subquery scan derived_table1  (cost=155810.13..166197.48 rows=1038735 width=0)         ->  xn hashaggregate  (cost=155810.13..155810.13 rows=1038735 width=39)               ->  xn seq scan on daily_players  (cost=0.00..103873.42 rows=10387342 width=39) 

the same pattern persists no matter in outer query. can group install_app_version , user_id, can take aggregates, can no grouping @ externally. sorting inner query nothing.

in cases i've shown it's not such big deal, i'm joining several subqueries own group by, doing aggregates on - gets out of hand , slow without groupaggregate.

if has wisdom query optimizer , can answer this, it'd appreciated! thanks!


Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -