Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-26 02:10:44
Message-ID: CAApHDvqh+qOHk4sbvvy=Qr2NjPqAAVYf82oXY0g=Z2hRpC2Vmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 25 Jan 2023 at 06:32, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
> I am not seeing other cases where patch version is consistently slower.

I just put together a benchmark script to try to help paint a picture
of under what circumstances reducing the number of sorts slows down
performance.

The benchmark I used creates a table with 2 int4 columns, "a" and "b".
I have a single WindowClause that sorts on "a" and then ORDER BY a,b.
In each test, I change the number of distinct values in "a" starting
with 1 distinct value then in each subsequent test I multiply that
number by 10 each time all the way up to 1 million. The idea here is
to control how often the sort that's performing a sort by both columns
must call the tiebreak function. I did 3 subtests for each number of
distinct rows in "a". 1) Unsorted case where the rows are put into the
tuples store in an order that's not presorted, 2) tuplestore rows are
already sorted leading to hitting our qsort fast path. 3) random
order.

You can see from the results that the patched version is not looking
particularly great. I did a 1 million row test and a 10 million row
test. work_mem was 4GB for each, so the sorts never went to disk.

Overall, the 1 million row test on master took 11.411 seconds and the
patched version took 11.282 seconds, so there was a *small* gain
overall there. With the 10 million row test, overall, master took
121.063 seconds, whereas patched took 130.727 seconds. So quite a
large performance regression there.

I'm unsure if 69749243 might be partially to blame here as it favours
single-key sorts. If you look at qsort_tuple_signed_compare(), you'll
see that the tiebreak function will be called only when it's needed
and there are > 1 sort keys. The comparetup function will re-compare
the first key all over again. If I get some more time I'll run the
tests again with the sort specialisation code disabled to see if the
situation is the same or not.

Another way to test that would be to have a table with 3 columns and
always sort by at least 2.

I've attached the benchmark script that I used and also a copy of the
patch with a GUC added solely to allow easier benchmarking of patched
vs unpatched.

I think we need to park this patch until we can figure out what can be
done to stop these regressions. We might want to look at 1) Expanding
on what 69749243 did and considering if we want sort specialisations
that are specifically for 1 column and another set for multi-columns.
The multi-column ones don't need to re-compare key[0] again. 2)
Sorting in smaller batches that better fit into CPU cache. Both of
these ideas would require a large amount of testing and discussion.
For #1 we're considering other specialisations, for example NOT NULL,
and we don't want to explode the number of specialisations we have to
compile into the binary.

David

Attachment Content-Type Size
bench_windowsort.sh.txt text/plain 1.2 KB
1million_row_test.png image/png 119.7 KB
10million_row_test.png image/png 99.2 KB
orderby_windowclause_pushdown_testing_only.patch.txt text/plain 19.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-26 02:27:57 Re: suppressing useless wakeups in logical/worker.c
Previous Message Andres Freund 2023-01-26 01:56:56 Re: New strategies for freezing, advancing relfrozenxid early