Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-06-07 06:51:22
Message-ID: CAApHDvrDGGyhWFAcYuCbcj=3vuLgO+w2-GC=QDNVv3DdAzwRbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 6 Jun 2020 at 21:15, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> My concerns are more about having two different sets of distinct
> uniquekeys:
>
> * one prepared in standard_qp_callback for skip scan (I guess those
> should be added to PlannerInfo?)

Yes. Those must be set so that we know if and what we should try to
create Skip Scan Index paths for. Just like we'll create index paths
for PlannerInfo.query_pathkeys.

> * one in create_distinct_paths as per current implementation
>
> with what seems to be similar content.

I think we need to have UniqueKeys in RelOptInfo so we can describe
what a relation is unique by. There's no point for example in
creating skip scan paths for a relation that's already unique on
whatever we might try to skip scan on. e.g someone does:

SELECT DISTINCT unique_and_indexed_column FROM tab;

Since there's a unique index on unique_and_indexed_column then we
needn't try to create a skipscan path for it.

However, the advantages of having UniqueKeys on the RelOptInfo goes a
little deeper than that. We can make use of it anywhere where we
currently do relation_has_unique_index_for() for. Plus we get what
Andy wants and can skip useless DISTINCT operations when the result is
already unique on the distinct clause. Sure we could carry all the
relation's unique properties around in Paths, but that's not the right
place. It's logically a property of the relation, not the path
specifically. RelOptInfo is a good place to store the properties of
relations.

The idea of the meaning of uniquekeys within a path is that the path
is specifically making those keys unique. We're not duplicating the
RelOptInfo's uniquekeys there.

If we have a table like:

CREATE TABLE tab (
a INT PRIMARY KEY,
b INT NOT NULL
);

CREATE INDEX tab_b_idx ON tab (b);

Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
index scan paths uniquekey properties set to NULL, but the skipscan
index path uniquekeys for tab_b_idx set to {b}. Then when we go
create the distinct paths Andy's work will see that there's no
RelOptInfo uniquekeys for the distinct clause, but the skip scan work
will loop over the unique_pathlist and find that we have a skipscan
path with the required uniquekeys, a.k.a {b}.

Does that make sense?

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dong Wook Lee 2020-06-07 07:51:08 TAP tests not enabled in pg_dump
Previous Message Drouvot, Bertrand 2020-06-07 06:12:59 Re: Add LWLock blocker(s) information