Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-04-14 09:09:31
Message-ID: CAApHDvpx1qED1uLqubcKJ=oHatCMd7pTUKkdq0B72_08nbR3Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 3 Apr 2020 at 15:18, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> All above 4 item Done.

Just to explain my view on this going forward for PG14. I do plan to
do a more thorough review of this soon. I wasn't so keen on pursuing
this for PG13 as the skip scans patch [1] needs to use the same
infrastructure this patch has added and it does not, yet.

The infrastructure (knowing the unique properties of a RelOptInfo), as
provided by the patch Andy has been working on, which is based on my
rough prototype version, I believe should be used for the skip scans
patch as well. I understand that as skip scans currently stands,
Jasper has done quite a bit of work to add the UniqueKeys, however,
this was unfortunately based on some early description of UniqueKeys
where I had thought that we could just store EquivalenceClasses. I no
longer think that's the case, and I believe the implementation that we
require is something more along the lines of Andy's latest version of
the patch. However, I've not quite stared at it long enough to be
highly confident in that.

I'd like to strike up a bit of a plan to move both Andy's work and the
Skip scans work forward for PG14.

Here are my thoughts so far:

1. Revise v4 of remove DISTINCT patch to split the patch into two pieces.

0001 should add the UniqueKey code but not any additional planner
smarts to use them (i.e remove GROUP BY / DISTINCT) elimination parts.
Join removals and Unique joins should use UniqueKeys in this patch.
0002 should add back the GROUP BY / DISTINCT smarts and add whatever
tests should be added for that and include updating existing expected
results and modifying any tests which no longer properly test what
they're meant to be testing.

I've done this with the attached patch.

2. David / Jesper to look at 0001 and build or align the existing skip
scans 0001 patch to make use of Andy's 0001 patch. This will require
tagging UniqueKeys onto Paths, not just RelOptInfos, plus a bunch of
other work.

Clearly UniqueKeys must suit both needs and since we have two
different implementations each providing some subset of the features,
then clearly we're not yet ready to move both skip scans and this
patch forward together. We need to align that and move both patches
forward together. Hopefully, the attached 0001 patch helps move that
along.

While I'm here, a quick review of Andy's v4 patch. I didn't address
any of this in the attached v5. These are only based on what I saw
when shuffling some code around. It's not an in-depth review.

1. Out of date comment in join.sql

-- join removal is not possible when the GROUP BY contains a column that is
-- not in the join condition. (Note: as of 9.6, we notice that b.id is a
-- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
-- but this happens too late for join removal in the outer plan level.)
explain (costs off)
select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
on d.a = s.d;

You've changed the GROUP BY clause so it does not include b.id, so the
Note in the comment is now misleading.

2. I think 0002 is overly restrictive in its demands that
parse->hasAggs must be false. We should be able to just use a Group
Aggregate with unsorted input when the input_rel is unique on the
GROUP BY clause. This will save on hashing and sorting. Basically
similar to what we do for when a query contains aggregates without any
GROUP BY.

3. I don't quite understand why you changed this to a right join:

-- Test case where t1 can be optimized but not t2
explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y

Perhaps this change is left over from some previous version of the patch?

[1] https://commitfest.postgresql.org/27/1741/

Attachment Content-Type Size
v5-0001-Introduce-UniqueKeys-to-determine-RelOptInfo-uniq.patch application/octet-stream 64.1 KB
v5-0002-Skip-DISTINCT-GROUP-BY-if-input-is-already-unique.patch application/octet-stream 24.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-04-14 09:12:22 Re: index paths and enable_indexscan
Previous Message Julien Rouhaud 2020-04-14 09:00:21 Re: Lexer issues