Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, rushabh(dot)lathia(at)gmail(dot)com, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-05-13 15:48:25
Message-ID: CAKU4AWo06r1H1tpj_W4RCUFOPufgZtcqoLeFCdt74LWS1WyGyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Fri, May 8, 2020 at 7:27 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> >> + else if (inner_is_onerow)
> >> + {
> >> + /* Since rows in innerrel can't be duplicated AND if
> >> innerrel is onerow,
> >> + * the join result will be onerow also as well. Note:
> >> onerow implies
> >> + * multi_nullvals = false.
> >>
> >> I don't understand this comment. Why is there only one row in the other
> >> relation which can join to this row?
> >
> >
> > I guess you may miss the onerow special case if I understand correctly.
> > inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
> > innerrel can't be duplicated means: t1.y = t2.pk; so the finally
> result is onerow
> > as well. One of the overall query is SELECT .. FROM t1, t2 where t2.y
> = t2.pk;
> >
> >
> > I explained more about onerow in the v7 README.unqiuekey document, just
> copy
> > it here.
> For some reason this mail remained in my drafts without being sent.
> Sending it now. Sorry.
>
> My impression about the one row stuff, is that there is too much
> special casing around it. We should somehow structure the UniqueKey
> data so that one row unique keys come naturally rather than special
> cased. E.g every column in such a case is unique in the result so
> create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey, later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.

> or create one
> unique key with no column as you have done but handle it more
> gracefully rather than spreading it all over the place.

I think this is what I do now, but it is possible that I spread it more than
necessary, if so, please let me know. I maintained the README.uniquekey
carefully since v7 and improved a lot in v8, it may be a good place to check
it.

>
Also, the amount of code that these patches changes seems to be much
> larger than the feature's worth arguably. But it indicates that we are
> modifying/adding more code than necessary. Some of that code can be
> merged into existing code which does similar things as I have pointed
> out in my previous comment.
>
>
I have reused the code select_mergejoin_clause rather than maintaining my
own copies in v8. Thanks a lot about that suggestion. This happened mainly
because I didn't read enough code. I will go through more to see if I have
similar
issues.

> Thanks for working on the expanded scope of the initial feature you
> proposed. But it makes the feature more useful, I think.
>
> That's mainly because your suggestions are always insightful which makes me
willing to continue to work on it, so thank you all!

===
In fact, I was hesitated that how to reply an email when I send an new
version
of the patch. One idea is I should explain clear what is the difference
between Vn
and Vn-1. The other one is not many people read the Vn-1, so I would like
to keep
the email simplified and keep the README clearly to save the reviewer's
time.
Actually there are more changes in v8 than I stated above. for example:
for the
UniqueKey on baserelation, we will reduce the expr from the UniqueKey if
the
expr is a const. Unique on (A, B). query is SELECT b FROM t WHERE a =
1;
in v7, the UniqueKey is (a, b). In v8, the UniqueKey is (b) only. But
since most
people still not start to read it, so I add such information to README
rather than
echo the same in email thread. I will try more to understand how to
communicate more
smooth. But any suggestion on this part is appreciated.

Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-05-13 16:02:42 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message Dilip Kumar 2020-05-13 15:46:26 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions