Re: Allow DISTINCT to use Incremental Sort

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow DISTINCT to use Incremental Sort
Date: 2023-01-09 13:28:16
Message-ID: CAMbWs4_7Jq7ZQRREEYtOec91pymB7kEYXJV9PcBF0ymKVMDQPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 7, 2023 at 5:47 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> While working on the regression tests added in a14a58329, I noticed
> that DISTINCT does not make use of Incremental Sort. It'll only ever
> do full sorts on the cheapest input path or make use of a path that's
> already got the required pathkeys. Also, I see that
> create_final_distinct_paths() is a little quirky and if the cheapest
> input path happens to be sorted, it'll add_path() the same path twice,
> which seems like a bit of a waste of effort. That could happen if say
> enable_seqscan is off or if a Merge Join is the cheapest join method.
>
> Additionally, the parallel DISTINCT code looks like it should also get
> the same treatment. I see that I'd coded this to only add a unique
> path atop of a presorted path and it never considers sorting the
> cheapest partial path. I've adjusted that in the attached and also
> made it consider incremental sorting any path with presorted keys.
>
> Please see the attached patch.

+1 for the changes. A minor comment is that previously on HEAD for
SELECT DISTINCT case, if we have to do an explicit full sort atop the
cheapest path, we try to make sure to always use the more rigorous
ordering.

/* For explicit-sort case, always use the more rigorous clause */
if (list_length(root->distinct_pathkeys) <
list_length(root->sort_pathkeys))
{
needed_pathkeys = root->sort_pathkeys;
/* Assert checks that parser didn't mess up... */
Assert(pathkeys_contained_in(root->distinct_pathkeys,
needed_pathkeys));
}
else
needed_pathkeys = root->distinct_pathkeys;

I'm not sure if this is necessary, as AFAIU the parser should have
ensured that the sortClause is always a prefix of distinctClause.

In the patch this code has been removed. I think we should also remove
the related comments in create_final_distinct_paths.

* When we have DISTINCT ON, we must sort by the more rigorous of
* DISTINCT and ORDER BY, else it won't have the desired behavior.
- * Also, if we do have to do an explicit sort, we might as well use
- * the more rigorous ordering to avoid a second sort later. (Note
- * that the parser will have ensured that one clause is a prefix of
- * the other.)

Also, the comment just above this one is outdated too.

* First, if we have any adequately-presorted paths, just stick a
* Unique node on those. Then consider doing an explicit sort of the
* cheapest input path and Unique'ing that.

The two-step workflow is what is the case on HEAD but not any more in
the patch. And I think we should mention incremental sort on any paths
with presorted keys.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2023-01-09 13:29:01 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Andrew Dunstan 2023-01-09 13:00:26 Allow +group in pg_ident.conf