RE: Index Skip Scan (new UniqueKeys)

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>
Subject: RE: Index Skip Scan (new UniqueKeys)
Date: 2020-07-23 11:43:56
Message-ID: 8a483a1889b24e888f6cf37bc5baea85@opammb0561.comp.optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> One UniqueKey can have multiple corresponding expressions, which gives us
> also possibility of having one unique key with (t1.a, t2.a) and it looks now
> similar to EquivalenceClass.
>

I believe the current definition of a unique key with two expressions (t1.a, t2.a) means that it's unique on the tuple (t1.a, t2.a) - this gives weaker guarantees than uniqueness on (t1.a) and uniqueness on (t2.a).

>
> The idea behind this query sounds questionable to me, more transparent
> would be to do this without distinct, skipping will actually do exactly the same
> stuff just under another name. But if allowing skipping on constants do not
> bring significant changes in the code probably it's fine.
>

Yeah indeed, I didn't say it's a query that people should generally write. :-) It's better to write as a regular SELECT with LIMIT 1 of course. However, it's more to be consistent and predictable to the user: if a SELECT DISTINCT ON (a) * FROM t1 runs fast, then it doesn't make sense to the user if a SELECT DISTINCT ON (a) * FROM t1 WHERE a=2 runs slow. And to support it also makes the implementation more consistent with little code changes.

> >
> > Yeah, there's definitely some double work there, but the actual impact may
> be limited - it doesn't actually allocate a new path key, but it looks it up in
> root->canon_pathkeys and returns that path key.
> > I wrote it like this, because I couldn't find a way to identify from a certain
> PathKey the actual location in the index of that column. The constructed path
> keys list filters out all redundant path keys. An index on (a,a,b,a,b) becomes
> path keys (a,b). Now if we skip on (a,b) we actually need to use prefix=3. But
> how to get from PathKey=b to that number 3, I didn't find a solid way except
> doing this. Maybe there is though?
>
> I don't think there is a direct way, but why not modify build_index_paths to
> also provide this information, or compare index_pathkeys expressions with
> indextlist without actually create those pathkeys again?
>

I agree there could be other ways - I don't currently have a strong preference for either. I can have a look at this later.

> And couple of words about this thread [1]. It looks to me like a strange way
> of interacting with the community. Are you going to duplicate there
> everything, or what are your plans? At the very least you could try to include
> everyone involved in the recipients list, not exclude some of the authors.
>

When I wrote the first mail in the thread, I went to this thread [1] and included everyone from there, but I see now that I only included the to: and cc: people and forgot the original thread author, you. I'm sorry about that - I should've looked better to make sure I had everyone.
In any case, my plan is to keep the patch at least applicable to master, as I believe it can be helpful for discussions about both patches.

[1] https://www.postgresql.org/message-id/20200609102247.jdlatmfyeecg52fi%40localhost

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-07-23 11:55:19 Re: Compatible defaults for LEAD/LAG
Previous Message Daniel Gustafsson 2020-07-23 11:35:24 Re: explain HashAggregate to report bucket and memory stats