sql - GroupAggregate for Subquery in Redshift/PostgreSQL -
i've noticed strange behavior in query optimizer redshift, , i'm wondering if can explain or point out workaround.
for large group by
queries, it's pretty essential optimizer plan groupaggregate rather hashaggregate, doesn't try fit temporary results in memory. works fine me in general. when try use group by
subquery, switches hashaggregate.
for example, consider following query.
select install_app_version, user_id, max(platform) plat dailies group install_app_version, user_id;
the table dailies has sortkeys (install_app_version, user_id) , distkey (user_id). hence groupaggregate possible, , query plan looks this, should.
xn groupaggregate (cost=0.00..184375.32 rows=1038735 width=51) -> xn seq scan on daily_players (cost=0.00..103873.42 rows=10387342 width=51)
in contrast, if use above in subquery of other query, hashaggregate. example, simple as
select count(1) ( select install_app_version, user_id, max(platform) plat daily_players group install_app_version, user_id );
has query plan
xn aggregate (cost=168794.32..168794.32 rows=1 width=0) -> xn subquery scan derived_table1 (cost=155810.13..166197.48 rows=1038735 width=0) -> xn hashaggregate (cost=155810.13..155810.13 rows=1038735 width=39) -> xn seq scan on daily_players (cost=0.00..103873.42 rows=10387342 width=39)
the same pattern persists no matter in outer query. can group install_app_version , user_id, can take aggregates, can no grouping @ externally. sorting inner query nothing.
in cases i've shown it's not such big deal, i'm joining several subqueries own group by
, doing aggregates on - gets out of hand , slow without groupaggregate.
if has wisdom query optimizer , can answer this, it'd appreciated! thanks!
Comments
Post a Comment