Re: Index Skip Scan

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-07-22 05:44:54
Message-ID: CAKJS1f8-Qk254gn8exps7qD5oDNLyOO6LCiDqsWNufmi3w-atg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 17 Jul 2019 at 04:53, Jesper Pedersen
<jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Here is a patch more in that direction.

Thanks. I've just had a look over this and it roughly what I have in mind.

Here are the comments I noted down during the review:

cost_index:

I know you've not finished here, but I think it'll need to adjust
tuples_fetched somehow to account for estimate_num_groups() on the
Path's unique keys. Any Eclass with an ec_has_const = true does not
need to be part of the estimate there as there can only be at most one
value for these.

For example, in a query such as:

SELECT x,y FROM t WHERE x = 1 GROUP BY x,y;

you only need to perform estimate_num_groups() on "y".

I'm really not quite sure on what exactly will be required from
amcostestimate() here. The cost of the skip scan is not the same as
the normal scan. So other that API needs adjusted to allow the caller
to mention that we want skip scans estimated, or there needs to be
another callback.

build_index_paths:

I don't quite see where you're checking if the query's unique_keys
match what unique keys can be produced by the index. This is done for
pathkeys with:

pathkeys_possibly_useful = (scantype != ST_BITMAPSCAN &&
!found_lower_saop_clause &&
has_useful_pathkeys(root, rel));
index_is_ordered = (index->sortopfamily != NULL);
if (index_is_ordered && pathkeys_possibly_useful)
{
index_pathkeys = build_index_pathkeys(root, index,
ForwardScanDirection);
useful_pathkeys = truncate_useless_pathkeys(root, rel,
index_pathkeys);
orderbyclauses = NIL;
orderbyclausecols = NIL;
}

Here has_useful_pathkeys() checks if the query requires some ordering.
For unique keys you'll want to do the same. You'll have set the
query's unique key requirements in standard_qp_callback().

I think basically build_index_paths() should be building index paths
with unique keys, for all indexes that can support the query's unique
keys. I'm just a bit uncertain if we need to create both a normal
index path and another path for the same index with unique keys.
Perhaps we can figure that out down the line somewhere. Maybe it's
best to build path types for now, when possible, and we can consider
later if we can skip the non-uniquekey paths. Likely that would
require a big XXX comment to explain we need to review that before the
code makes it into core.

get_uniquekeys_for_index:

I think this needs to follow more the lead from build_index_pathkeys.
Basically, ask the index what it's pathkeys are.

standard_qp_callback:

build_group_uniquekeys & build_distinct_uniquekeys could likely be one
function that takes a list of SortGroupClause. You just either pass
the groupClause or distinctClause in. Pretty much the UniqueKey
version of make_pathkeys_for_sortclauses().

> Some questions:
>
> 1) Do we really need the UniqueKey struct ? If it only contains the
> EquivalenceClass field then we could just have a list of those instead.
> That would make the patch simpler.

I dunno about that. I understand it looks a bit pointless due to just
having one field, but perhaps we can worry about that later. If we
choose to ditch it and replace it with just an EquivalenceClass then
we can do that later.

> 2) Do we need both canon_uniquekeys and query_uniquekeys ? Currently
> the patch only uses canon_uniquekeys because the we attach the list
> directly on the Path node.

canon_uniquekeys should store at most one UniqueKey per
EquivalenceClass. The reason for this is for fast comparison. We can
compare memory addresses rather than checking individual fields are
equal. Now... yeah it's true that there is only one field so far and
we could just check the pointers are equal on the EquivalenceClasses,
but I think maybe this is in the same boat as #1. Let's do it for now
so we're sticking as close to the guidelines laid out by PathKeys and
once it's all working and plugged into skip scans then we can decide
if it needs a simplification pass over the code.

> I'll clean the patch up based on your feedback, and then start to rebase
> v21 on it.

Cool.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2019-07-22 06:02:16 Re: progress report for ANALYZE
Previous Message Alexander Lakhin 2019-07-22 05:00:00 Re: Fix typos and inconsistencies for HEAD (take 7)