Re: Index Skip Scan

From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-07-22 17:10:47
Message-ID: 1d917274-73c1-24de-9faf-c1d2c542cb23@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 7/22/19 1:44 AM, David Rowley wrote:
> Here are the comments I noted down during the review:
>
> cost_index:
>
> I know you've not finished here, but I think it'll need to adjust
> tuples_fetched somehow to account for estimate_num_groups() on the
> Path's unique keys. Any Eclass with an ec_has_const = true does not
> need to be part of the estimate there as there can only be at most one
> value for these.
>
> For example, in a query such as:
>
> SELECT x,y FROM t WHERE x = 1 GROUP BY x,y;
>
> you only need to perform estimate_num_groups() on "y".
>
> I'm really not quite sure on what exactly will be required from
> amcostestimate() here. The cost of the skip scan is not the same as
> the normal scan. So other that API needs adjusted to allow the caller
> to mention that we want skip scans estimated, or there needs to be
> another callback.
>

I think this part will become more clear once the index skip scan patch
is rebased, as we got the uniquekeys field on the Path, and the
indexskipprefixy info on the IndexPath node.

> build_index_paths:
>
> I don't quite see where you're checking if the query's unique_keys
> match what unique keys can be produced by the index. This is done for
> pathkeys with:
>
> pathkeys_possibly_useful = (scantype != ST_BITMAPSCAN &&
> !found_lower_saop_clause &&
> has_useful_pathkeys(root, rel));
> index_is_ordered = (index->sortopfamily != NULL);
> if (index_is_ordered && pathkeys_possibly_useful)
> {
> index_pathkeys = build_index_pathkeys(root, index,
> ForwardScanDirection);
> useful_pathkeys = truncate_useless_pathkeys(root, rel,
> index_pathkeys);
> orderbyclauses = NIL;
> orderbyclausecols = NIL;
> }
>
> Here has_useful_pathkeys() checks if the query requires some ordering.
> For unique keys you'll want to do the same. You'll have set the
> query's unique key requirements in standard_qp_callback().
>
> I think basically build_index_paths() should be building index paths
> with unique keys, for all indexes that can support the query's unique
> keys. I'm just a bit uncertain if we need to create both a normal
> index path and another path for the same index with unique keys.
> Perhaps we can figure that out down the line somewhere. Maybe it's
> best to build path types for now, when possible, and we can consider
> later if we can skip the non-uniquekey paths. Likely that would
> require a big XXX comment to explain we need to review that before the
> code makes it into core.
>
> get_uniquekeys_for_index:
>
> I think this needs to follow more the lead from build_index_pathkeys.
> Basically, ask the index what it's pathkeys are.
>
> standard_qp_callback:
>
> build_group_uniquekeys & build_distinct_uniquekeys could likely be one
> function that takes a list of SortGroupClause. You just either pass
> the groupClause or distinctClause in. Pretty much the UniqueKey
> version of make_pathkeys_for_sortclauses().
>
Yeah, I'll move this part of the index skip scan patch to the unique key
patch.

Thanks for your feedback !

Best regards,
Jesper

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2019-07-22 17:25:41 Re: pg_receivewal documentation
Previous Message Alvaro Herrera 2019-07-22 17:05:32 Re: Add parallelism and glibc dependent only options to reindexdb