Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-28 08:21:09
Message-ID: 52a3def2-4d2b-fb76-e334-5079ac67d7c7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 26/01/23 07:40, David Rowley wrote:

> You can see from the results that the patched version is not looking
> particularly great. I did a 1 million row test and a 10 million row
> test. work_mem was 4GB for each, so the sorts never went to disk.

Yes, its lackluster gains are very limited (pretty much when data gets
pushed to disk).

> I'm unsure if 69749243 might be partially to blame here as it favours
> single-key sorts. If you look at qsort_tuple_signed_compare(), you'll
> see that the tiebreak function will be called only when it's needed
> and there are > 1 sort keys. The comparetup function will re-compare
> the first key all over again. If I get some more time I'll run the
> tests again with the sort specialisation code disabled to see if the
> situation is the same or not.
> Another way to test that would be to have a table with 3 columns and
> always sort by at least 2.

I will need to go through this.

> I've attached the benchmark script that I used and also a copy of the
> patch with a GUC added solely to allow easier benchmarking of patched
> vs unpatched.

This is much relief, will be easier to repro and create more cases. Thanks.

> I think we need to park this patch until we can figure out what can be
> done to stop these regressions.

Makes sense.

> We might want to look at 1) Expanding
> on what 69749243 did and considering if we want sort specialisations
> that are specifically for 1 column and another set for multi-columns.
> The multi-column ones don't need to re-compare key[0] again. 2)
> Sorting in smaller batches that better fit into CPU cache. Both of
> these ideas would require a large amount of testing and discussion.
> For #1 we're considering other specialisations, for example NOT NULL,
> and we don't want to explode the number of specialisations we have to
> compile into the binary.

Yes, 1 & 2 needs to be addressed before going ahead with this patch.
Do we any have ongoing thread with #1 and #2?

Also, we have seen an issue with cost ( 1 sort vs 2 sorts, both having same cost)
https://www.postgresql.org/message-id/CAApHDvo2y9S2AO-BPYo7gMPYD0XE2Lo-KFLnqX80fcftqBCcyw@mail.gmail.com
This needs to be investigated too (although might not be related but need to check at least)

I will do some study around things mentioned here and will setup new threads (if they don't exists)
based on discussions we had here .

Thanks,
Ankit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-01-28 09:01:35 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message Nathan Bossart 2023-01-28 06:27:29 Re: recovery modules