Is postgres able to share sorts required by common partition window functions?

From: Sebastien Arod <sebastien(dot)arod(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Is postgres able to share sorts required by common partition window functions?
Date: 2020-07-06 16:43:31
Message-ID: CADd42iFZWwYNsXjEM_3HWK3QnfiCrMNmpOkZqyBQCabnVxOPtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I'm trying to optimize the following query on postgres 11.6 (running on
Aurora)
select distinct
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by c3) AS c3,
first_value(c4) OVER (PARTITION BY c1 order by c4) AS c4
from
t;

From the explain plan (attached at the end of the email) I see that
postgresql is doing several sorts one per window function and one for the
distinct that seems ok.
However all the window functions being on the same partition I would have
expected postgresql to "share" a preliminary sort on c1 that would then be
useful to reduce the work on all window functions but it doesn't.
I even created an index on c1 hoping that postgresql would be able to use
it in order to minimize the cost of the sorts but I couldn't make it use it.

Is there something I am missing?

You can find below a script to set up a table and data to reproduce as well
as the explain plan.

*Setup Script*
create table t(
pk varchar(200) PRIMARY key,
c1 varchar(200),
c2 varchar(200),
c3 varchar(200),
c4 varchar(200)
);
create index i1 on t (c1);

insert into t
(pk, c1, c2, c3, c4 )
select
generate_series::text pk,
'Grp' ||(generate_series / 4)::text c1,
generate_series::text c2,
generate_series::text c3,
generate_series::text c4
from generate_series(0, 1000000);

*Explain Plan*
Unique (cost=808480.87..820980.88 rows=1000001 width=123) (actual
time=7131.675..7781.082 rows=250001 loops=1)
-> Sort (cost=808480.87..810980.87 rows=1000001 width=123) (actual
time=7131.673..7603.926 rows=1000001 loops=1)
Sort Key: c1, (first_value(c2) OVER (?)), (first_value(c3) OVER
(?)), (first_value(c4) OVER (?))
Sort Method: external merge Disk: 59640kB
-> WindowAgg (cost=558937.90..578937.92 rows=1000001 width=123)
(actual time=5179.374..6268.937 rows=1000001 loops=1)
-> Sort (cost=558937.90..561437.90 rows=1000001 width=91)
(actual time=5179.355..5679.136 rows=1000001 loops=1)
Sort Key: c1, c4
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=336736.93..356736.95 rows=1000001
width=91) (actual time=3260.950..4389.116 rows=1000001 loops=1)
-> Sort (cost=336736.93..339236.93 rows=1000001
width=59) (actual time=3260.934..3778.385 rows=1000001 loops=1)
Sort Key: c1, c3
Sort Method: external merge Disk: 46176kB
-> WindowAgg (cost=141877.96..161877.98
rows=1000001 width=59) (actual time=1444.692..2477.284 rows=1000001 loops=1)
-> Sort (cost=141877.96..144377.96
rows=1000001 width=27) (actual time=1444.669..1906.993 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge
Disk: 39424kB
-> Seq Scan on t
(cost=0.00..18294.01 rows=1000001 width=27) (actual time=0.011..177.815
rows=1000001 loops=1)
Planning Time: 0.214 ms
Execution Time: 7839.646 ms

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2020-07-06 19:51:36 Re: Apply LIMIT when computation is logically irrelevant
Previous Message Christophe Courtois 2020-07-06 16:41:37 Pgbench : vacuum default behaviour history