Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-06-09 10:29:13
Message-ID: 20200609102913.jydqirxee5fdybhx@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sun, Jun 07, 2020 at 06:51:22PM +1200, David Rowley wrote:
>
> > * one in create_distinct_paths as per current implementation
> >
> > with what seems to be similar content.
>
> I think we need to have UniqueKeys in RelOptInfo so we can describe
> what a relation is unique by. There's no point for example in
> creating skip scan paths for a relation that's already unique on
> whatever we might try to skip scan on. e.g someone does:
>
> SELECT DISTINCT unique_and_indexed_column FROM tab;
>
> Since there's a unique index on unique_and_indexed_column then we
> needn't try to create a skipscan path for it.
>
> However, the advantages of having UniqueKeys on the RelOptInfo goes a
> little deeper than that. We can make use of it anywhere where we
> currently do relation_has_unique_index_for() for. Plus we get what
> Andy wants and can skip useless DISTINCT operations when the result is
> already unique on the distinct clause. Sure we could carry all the
> relation's unique properties around in Paths, but that's not the right
> place. It's logically a property of the relation, not the path
> specifically. RelOptInfo is a good place to store the properties of
> relations.
>
> The idea of the meaning of uniquekeys within a path is that the path
> is specifically making those keys unique. We're not duplicating the
> RelOptInfo's uniquekeys there.
>
> If we have a table like:
>
> CREATE TABLE tab (
> a INT PRIMARY KEY,
> b INT NOT NULL
> );
>
> CREATE INDEX tab_b_idx ON tab (b);
>
> Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
> the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
> index scan paths uniquekey properties set to NULL, but the skipscan
> index path uniquekeys for tab_b_idx set to {b}. Then when we go
> create the distinct paths Andy's work will see that there's no
> RelOptInfo uniquekeys for the distinct clause, but the skip scan work
> will loop over the unique_pathlist and find that we have a skipscan
> path with the required uniquekeys, a.k.a {b}.
>
> Does that make sense?

Yes, from this point of view it makes sense. I've already posted the
first version of index skip scan based on this implementation [1]. There
could be rought edges, but overall I hope we're on the same page.

[1]: https://www.postgresql.org/message-id/flat/20200609102247.jdlatmfyeecg52fi%40localhost

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Georgios 2020-06-09 10:29:44 Include access method in listTables output
Previous Message Dagfinn Ilmari Mannsåker 2020-06-09 10:26:19 Re: TAP tests and symlinks on Windows