Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Date: 2022-10-13 03:47:02
Message-ID: CAMbWs49zGrqG2dg_AcP6TWmCQ+3o4eJvWC533q3jJ5O=X3N=CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 13, 2022 at 4:50 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> The problem is that I'm only added the LimitPath to the
> cheapest_total_path. I think to make your case work we'd need to add
> the LimitPath only in cases where the distinct_pathkeys are empty but
> the sort_pathkeys are not and hasDistinctOn is true and the path has
> pathkeys_contained_in(root->sort_pathkeys, path->pathkeys). I think
> that's doable, but it's become quite a bit more complex than the patch
> I proposed. Maybe it's worth a 2nd effort for that part?

Currently in the patch the optimization is done before we check for
presorted paths or do the explicit sort of the cheapest path. How about
we move this optimization into the branch where we've found any
presorted paths? Maybe something like:

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4780,11 +4780,24 @@ create_final_distinct_paths(PlannerInfo *root,
RelOptInfo *input_rel,

if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
{
- add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- path,
-
list_length(root->distinct_pathkeys),
- numDistinctRows));
+ if (root->distinct_pathkeys == NIL)
+ {
+ Node *limitCount = (Node *) makeConst(INT8OID, -1,
InvalidOid,
+ sizeof(int64),
+ Int64GetDatum(1),
false,
+ FLOAT8PASSBYVAL);
+
+ add_path(distinct_rel, (Path *)
+ create_limit_path(root, distinct_rel,
+ path, NULL, limitCount,
+ LIMIT_OPTION_COUNT, 0, 1));
+ }
+ else
+ add_path(distinct_rel, (Path *)
+ create_upper_unique_path(root, distinct_rel,
+ path,
+
list_length(root->distinct_pathkeys),
+ numDistinctRows));

This again makes the code less 'straightforward', just to cover a more
uncommon case. I'm also not sure if it's worth doing.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-10-13 03:49:26 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Richard Guo 2022-10-13 03:39:50 Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant