Re: Index Skip Scan

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "Peter Geoghegan" <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Thomas Munro" <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-07-11 07:41:31
Message-ID: 1562830890974.32121@Optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> For the general forward direction but for a backwards cursor scroll,

> we'd return the lowest value for each distinct prefix, but for the
> general backwards direction (DESC case) we'd return the highest value
> for each distinct prefix. Looking at IndexNext() the cursor direction
> seems to be estate->es_direction and the general scan direction is
> indicated by the plan's indexorderdir. Can't we just pass both of
> those to index_skip() to have it decide what to do? If we also pass in
> indexorderdir then index_skip() should know if it's to return the
> highest or lowest value, right?

Correct, with these two values correct behavior can be deduced. The implementation of this is a bit cumbersome though. Consider a case like:

SELECT DISTINCT ON (a) a,b,c FROM a WHERE c = 2 (with an index on a,b,c)
Data (imagine every tuple here actually occurs 10.000 times in the index to see the benefit of skipping):
1,1,1
1,1,2
1,2,2
1,2,3
2,2,1
2,2,3
3,1,1
3,1,2
3,2,2
3,2,3

Creating a cursor on this query and then moving forward, you should get (1,1,2), (3,1,2). In the current implementation of the patch, after bt_first, it skips over (1,1,2) to (2,2,1). It checks quals and moves forward one-by-one until it finds a match. This match only comes at (3,1,2) however. Then it skips to the end.

If you move the cursor backwards from the end of the cursor, you should still get (3,1,2) (1,1,2). A possible implementation would start at the end and do a skip to the beginning of the prefix: (3,1,1). Then it needs to move forward one-by-one in order to find the first matching (minimum) item (3,1,2). When it finds it, it needs to skip backwards to the beginning of prefix 2 (2,2,1). It needs to move forwards to find the minimum element, but should stop as soon as it detects that the prefix doesn't match anymore (because there is no match for prefix 2, it will move all the way from (2,2,1) to (3,1,1)). It then needs to skip backwards again to the start of prefix 1: (1,1,1) and scan forward to find (1,1,2).
Perhaps anyone can think of an easier way to implement it?

I do think being able to use DISTINCT ON is very useful and it's worth the extra complications. In the future we can add even more useful skipping features to it, for example:
SELECT DISTINCT ON (a) * FROM a WHERE b =2
After skipping to the next prefix of column a, we can start a new search for (a,b)=(prefix,2) to avoid having to move one-by-one from the start of the prefix to the first matching element. There are many other useful optimizations possible. That won't have to be for this patch though :-)

-Floris

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2019-07-11 07:47:24 Re: [HACKERS] [PATCH] Generic type subscripting
Previous Message Kyotaro Horiguchi 2019-07-11 07:40:59 Re: shared-memory based stats collector