Re: Some revises in adding sorting path

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some revises in adding sorting path
Date: 2023-02-21 09:02:44
Message-ID: CAMbWs4-X42MCvgZUWjC3VrTXm5eQzbKxQBLapwKgjJehQZ0JYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 14, 2023 at 10:53 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> I looked at the three patches and have some thoughts:

Thanks for reviewing!

> 0001:
>
> Does the newly added test have to be this complex? I think it might
> be better to just come up with the most simple test you can that uses
> an incremental sort. I really can't think why the test requires a FOR
> UPDATE, to test incremental sort, for example. The danger with making
> a test more complex than it needs to be is that it frequently gets
> broken by unrelated changes. The complexity makes it harder for
> people to understand the test's intentions and that increases the risk
> that the test eventually does not test what it was originally meant to
> test as the underlying code changes and the expected output is
> updated.

That makes sense. I agree that we should always use the minimal query
for test. For this patch, as pointed by Etsuro, it may not be a good
idea for the change. So I'll remove 0001.

> 0002:
>
> I think the following existing comment does not seem to be true any longer:
>
> > However, there's one more
> > * possibility: it may make sense to sort the cheapest partial path
> > * according to the required output order and then use Gather Merge.
>
> You've removed the comment that talks about trying Incremental Sort ->
> Gather Merge paths yet the code is still doing that, the two are just
> more consolidated now, so perhaps you need to come up with a new
> comment to explain what we're trying to achieve.

Yes, you are right. How about the comment below?

- * possibility: it may make sense to sort the cheapest partial path
- * according to the required output order and then use Gather Merge.
+ * possibility: it may make sense to sort the cheapest partial path or
+ * incrementally sort any partial path that is partially sorted according
+ * to the required output order and then use Gather Merge.

Looking at the codes now I have some concern that what we do in
create_ordered_paths for partial paths may have already been done in
generate_useful_gather_paths, especially when query_pathkeys is equal to
sort_pathkeys. Not sure if this is a problem. And the comment there
mentions generate_gather_paths(), but ISTM we should mention what
generate_useful_gather_paths has done.

> 0003:
>
> Looking at gather_grouping_paths(), I see it calls
> generate_useful_gather_paths() which generates a bunch of Gather Merge
> paths after sorting the cheapest path and incrementally sorting any
> partially sorted paths. Then back in gather_grouping_paths(), we go
> and create Gather Merge paths again, but this time according to the
> group_pathkeys instead of the query_pathkeys. I know you're not really
> changing anything here, but as I'm struggling to understand why
> exactly we're creating two sets of Gather Merge paths, it makes it a
> bit scary to consider changing anything in this area. I've not really
> found any comments that can explain to me sufficiently well enough so
> that I understand why this needs to be done. Do you know?

I'm also not sure why gather_grouping_paths creates Gather Merge paths
according to group_pathkeys after what generate_useful_gather_paths has
done. There is comment that mentions this but seems more explanation is
needed.

* generate_useful_gather_paths does most of the work, but we also consider a
* special case: we could try sorting the data by the group_pathkeys and then
* applying Gather Merge.

It seems that if there is available group_pathkeys, we will set
query_pathkeys to group_pathkeys because we want the result sorted for
grouping. In this case gather_grouping_paths may just generate
duplicate Gather Merge paths because generate_useful_gather_paths has
generated Gather Merge paths according to query_pathkeys. I tried to
reduce the code of gather_grouping_paths to just call
generate_useful_gather_paths and found no diffs in regression tests.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-02-21 09:34:11 Re: Incorrect command tag row count for MERGE with a cross-partition update
Previous Message Richard Guo 2023-02-21 08:55:24 Re: Some revises in adding sorting path