Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Floris Van Nee <florisvannee(at)optiver(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "rushabh(dot)lathia(at)gmail(dot)com" <rushabh(dot)lathia(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-12-05 15:10:28
Message-ID: CAKU4AWpYsa-L2--qOMcJFHxzw7T5px5ZmrVaTE3MO3mW4J6uEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you Heikki for your attention.

On Mon, Nov 30, 2020 at 11:20 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 30/11/2020 16:30, Jesper Pedersen wrote:
> > On 11/30/20 5:04 AM, Heikki Linnakangas wrote:
> >> On 26/11/2020 16:58, Andy Fan wrote:
> >>> This patch has stopped moving for a while, any suggestion about
> >>> how to move on is appreciated.
> >>
> >> The question on whether UniqueKey.exprs should be a list of
> >> EquivalenceClasses or PathKeys is unresolved. I don't have an opinion
> >> on that, but I'd suggest that you pick one or the other and just go
> >> with it. If it turns out to be a bad choice, then we'll change it.
> >
> > In this case I think it is matter of deciding if we are going to use
> > EquivalenceClasses or Exprs before going further; there has been work
> > ongoing in this area for a while, so having a clear direction from a
> > committer would be greatly appreciated.
>
> Plain Exprs are not good enough, because you need to know which operator
> the expression is unique on. Usually, it's the default = operator in the
> default btree opclass for the datatype, but it could be something else,
> too.

Actually I can't understand this, could you explain more? Based on my
current
knowledge, when we run "SELECT DISTINCT a FROM t", we never care about
which operator to use for the unique.

There's some precedence for PathKeys, as we generate PathKeys to
> represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the
> other hand, I've always found it confusing that we use PathKeys to
> represent DISTINCT and GROUP BY, which are not actually sort orderings.
>

OK, I have the same confusion now:)

Perhaps it would make sense to store EquivalenceClass+opfamily in
> UniqueKey, and also replace distinct_pathkeys and group_pathkeys with
> UniqueKeys.
>
>
I can understand why we need EquivalenceClass for UniqueKey, but I can't
understand why we need opfamily here.

For anyone who is interested with these patchsets, here is my plan about
this
now. 1). I will try EquivalenceClass rather than Expr in UniqueKey and
add opfamily
if needed. 2). I will start a new thread to continue this topic. The
current thread is too long
which may scare some people who may have interest in it. 3). I will give up
patch 5 & 6
for now. one reason I am not happy with the current implementation, and
the other
reason is I want to make the patchset smaller to make the reviewer easier.
I will not
give up them forever, after the main part of this patchset is committed, I
will continue
with them in a new thread.

Thanks everyone for your input.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-12-05 15:30:38 Change definitions of bitmap flags to bit-shifting style
Previous Message Andy Fan 2020-12-05 14:52:45 Re: Hybrid Hash/Nested Loop joins and caching results from subplans