Re: Index Skip Scan

From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-07-16 16:53:53
Message-ID: 5e8f78fe-fcd5-b0e8-628b-4ab2bbcba5ab@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

On 7/11/19 7:38 AM, David Rowley wrote:
> The UniqueKeys idea is quite separate from pathkeys. Currently, a
> Path can have a List of PathKeys which define the order that the
> tuples will be read from the Plan node that's created from that Path.
> The idea with UniqueKeys is that a Path can also have a non-empty List
> of UniqueKeys to define that there will be no more than 1 row with the
> same value for the Paths UniqueKey column/exprs.
>
> As of now, if you look at standard_qp_callback() sets
> root->query_pathkeys, the idea here would be that the callback would
> also set a new List field named "query_uniquekeys" based on the
> group_pathkeys when non-empty and !root->query->hasAggs, or by using
> the distinct clause if it's non-empty. Then in build_index_paths()
> around the call to match_pathkeys_to_index() we'll probably also want
> to check if the index can support UniqueKeys that would suit the
> query_uniquekeys that were set during standard_qp_callback().
>
> As for setting query_uniquekeys in standard_qp_callback(), this should
> be simply a matter of looping over either group_pathkeys or
> distinct_pathkeys and grabbing the pk_eclass from each key and making
> a canonical UniqueKey from that. To have these canonical you'll need
> to have a new field in PlannerInfo named canon_uniquekeys which will
> do for UniqueKeys what canon_pathkeys does for PathKeys. So you'll
> need an equivalent of make_canonical_pathkey() in uniquekey.c
>
> With this implementation, the code that the patch adds in
> create_distinct_paths() can mostly disappear. You'd only need to look
> for a path that uniquekeys_contained_in() matches the
> root->query_uniquekeys and then just leave it to
> set_cheapest(distinct_rel); to pick the cheapest path.
>
> It would be wasted effort to create paths with UniqueKeys if there's
> multiple non-dead base rels in the query as the final rel in
> create_distinct_paths would be a join rel, so it might be worth
> checking that before creating such index paths in build_index_paths().
> However, down the line, we could carry the uniquekeys forward into the
> join paths if the join does not duplicate rows from the other side of
> the join... That's future stuff though, not for this patch, I don't
> think.
>
> I think a UniqueKey can just be a struct similar to PathKey, e.g be
> located in pathnodes.h around where PathKey is defined. Likely we'll
> need a uniquekeys.c file that has the equivalent to
> pathkeys_contained_in() ... uniquekeys_contained_in(), which return
> true if arg1 is a superset of arg2 rather than check for one set being
> a prefix of another. As you mention above: UniqueKeys { x, y } ==
> UniqueKeys { y, x }. That superset check could perhaps be optimized
> by sorting UniqueKey lists in memory address order, that'll save
> having a nested loop, but likely that's not going to be required for a
> first cut version. This would work since you'd want UniqueKeys to be
> canonical the same as PathKeys are (Notice that compare_pathkeys()
> only checks memory addresses of pathkeys and not equals()).
>
> I think the UniqueKey struct would only need to contain an
> EquivalenceClass field. I think all the other stuff that's in PathKey
> is irrelevant to UniqueKey.
>

Here is a patch more in that direction.

Some questions:

1) Do we really need the UniqueKey struct ? If it only contains the
EquivalenceClass field then we could just have a list of those instead.
That would make the patch simpler.

2) Do we need both canon_uniquekeys and query_uniquekeys ? Currently
the patch only uses canon_uniquekeys because the we attach the list
directly on the Path node.

I'll clean the patch up based on your feedback, and then start to rebase
v21 on it.

Thanks in advance !

Best regards,
Jesper

Attachment Content-Type Size
v2_uniquekey.txt text/plain 19.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2019-07-16 17:03:12 Re: pg_receivewal documentation
Previous Message Laurenz Albe 2019-07-16 16:28:57 Re: pg_receivewal documentation