Re: [PATCH] Erase the distinctClause if the result is unique by definition

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date: 2020-02-11 16:29:06
Message-ID: CAExHW5sG2Q7aPAh4vpk85QhnuFfDBJYc3yFNGb43x6vc498rsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 10, 2020 at 10:57 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> writes:
> >> On Sat, Feb 8, 2020 at 12:53 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> >> Do you mean adding some information into PlannerInfo, and when we
> create
> >> a node for Unique/HashAggregate/Group, we can just create a dummy node?
>
> > Not so much as PlannerInfo but something on lines of PathKey. See PathKey
> > structure and related code. What I envision is PathKey class is also
> > annotated with the information whether that PathKey implies uniqueness.
> > E.g. a PathKey derived from a Primary index would imply uniqueness also.
> A
> > PathKey derived from say Group operation also implies uniqueness. Then
> just
> > by looking at the underlying Path we would be able to say whether we need
> > Group/Unique node on top of it or not. I think that would make it much
> > wider usecase and a very useful optimization.
>
> FWIW, that doesn't seem like a very prudent approach to me, because it
> confuses sorted-ness with unique-ness. PathKeys are about sorting,
> but it's possible to have uniqueness guarantees without having sorted
> anything, for instance via hashed grouping.
>

> I haven't looked at this patch, but I'd expect it to use infrastructure
> related to query_is_distinct_for(), and that doesn't deal in PathKeys.
>
> Thanks for the pointer. I think there's another problem with my approach.
PathKeys are specific to paths since the order of the result depends upon
the Path. But uniqueness is a property of the result i.e. relation and thus
should be attached to RelOptInfo as query_is_distinct_for() does. I think
uniquness should bubble up the RelOptInfo tree, annotating each RelOptInfo
with the minimum set of TLEs which make the result from that relation
unique. Thus we could eliminate extra Group/Unique node if the underlying
RelOptInfo's unique column set is subset of required uniqueness.
--
--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2020-02-11 16:36:17 Re: [PATCH] Erase the distinctClause if the result is unique by definition
Previous Message Tom Lane 2020-02-11 15:43:13 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager