Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-19 03:28:01
Message-ID: CAApHDvpAO5H_L84kn9gCJ_hihOavtmDjimKYyftjWtF69BJ=8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 19 Jan 2023 at 06:27, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
> Hmm, not really sure why did I miss that. I tried this again (added
> following in postgres.c above
>
> PortalStart)
>
> List* l = NIL;
> l = lappend(l, 1);
> l = lappend(l, 2);
> l = lappend(l, 3);
> l = lappend(l, 4);
>
> ListCell *lc;
> foreach_reverse(lc, l)
> {
> if (foreach_current_index(lc) == 2) // delete 3
> {
> foreach_delete_current(l, lc);
> }
> }

The problem is that the next item looked at is 1 and the value 2 is skipped.

> I also tried a bit unrealistic case.
>
> create table abcdefgh(a int, b int, c int, d int, e int, f int, g int, h int);
>
> insert into abcdefgh select a,b,c,d,e,f,g,h from
> generate_series(1,7) a,
> generate_series(1,7) b,
> generate_series(1,7) c,
> generate_series(1,7) d,
> generate_series(1,7) e,
> generate_series(1,7) f,
> generate_series(1,7) g,
> generate_series(1,7) h;
>
> explain analyze select count(*) over (order by a),
> row_number() over (partition by a order by b) from abcdefgh order by a,b,c,d,e,f,g,h;
>
> In patch version

> Execution Time: 82748.789 ms

> In Master

> Execution Time: 71172.586 ms

> Patch version sort took around 15 sec, whereas master sort took ~2 + 46
> = around 48 sec

> Maybe I am interpreting it wrong but still wanted to bring this to notice.

I think you are misinterpreting the results, but the main point
remains - it's slower. The explain analyze timing shows the time
between outputting the first row and the last row. For sort, there's a
lot of work that needs to be done before you output the first row.

I looked into this a bit further and using the same table as you, and
the attached set of hacks that adjust the ORDER BY path generation to
split a Sort into a Sort and Incremental Sort when the number of
pathkeys to sort by is > 2.

work_mem = '1GB' in all cases below:

I turned off the timing in EXPLAIN so that wasn't a factor in the
results. Generally having more nodes means more timing requests and
that's got > 0 overhead.

explain (analyze,timing off) select * from abcdefgh order by a,b,c,d,e,f,g,h;

7^8 rows
Master: Execution Time: 7444.479 ms
master + sort_hacks.diff: Execution Time: 5147.937 ms

So I'm also seeing Incremental Sort - > Sort faster than a single Sort
for 7^8 rows.

With 5^8 rows:
master: Execution Time: 299.949 ms
master + sort_hacks: Execution Time: 239.447 ms

and 4^8 rows:
master: Execution Time: 62.596 ms
master + sort_hacks: Execution Time: 67.900 ms

So at 4^8 sort_hacks becomes slower. I suspect this might be to do
with having to perform more swaps in the array elements that we're
sorting by with the full sort. When work_mem is large this array no
longer fits in CPU cache and I suspect those swaps become more costly
when there are more cache lines having to be fetched from RAM.

I think we really should fix tuplesort.c so that it batches sorts into
about L3 CPU cache-sized chunks in RAM rather than trying to sort much
larger arrays.

I'm just unsure if we should write this off as the expected behaviour
of Sort and continue with the patch or delay the whole thing until we
make some improvements to sort. I think more benchmarking is required
so we can figure out if this is a corner case or a common case. On the
other hand, we already sort WindowClauses with the most strict sort
order first which results in a full Sort and no additional sort for
subsequent WindowClauses that can make use of that sort order. It
would be bizarre to reverse the final few lines of common_prefix_cmp
so that we sort the least strict order first so we end up injecting
Incremental Sorts into the plan to make it faster!

David

Attachment Content-Type Size
sort_hacks.diff text/plain 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-01-19 04:02:08 Re: Remove source code display from \df+?
Previous Message Peter Geoghegan 2023-01-19 03:26:22 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation