Re: Index Skip Scan

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Floris Van Nee <florisvannee(at)Optiver(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2020-02-04 20:02:05
Message-ID: 20200204200205.ip5c2mzy3i2okqey@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mon, Jan 27, 2020 at 02:00:39PM +0000, Floris Van Nee wrote:
>
> Thanks for the new patch! I tested it and managed to find a case that causes
> some issues. Here's how to reproduce:

So, after a bit of investigation I found out the issue (it was actually there
even in the previous version). In this only case of moving forward and reading
backward, exactly scenarious you've described above, current implementation was
not ignoring deleted tuples.

My first idea to fix this was to use _bt_readpage when necessary and put
couple of _bt_killitems when we leave a page while jumping before, so that
deleted tuples will be ignored. To demonstrate it visually, let's say we
want to go backward on a cursor over an ORDER BY a DESC, b DESC query,
i.e. return:

(1,100), (2, 100), (3, 100) etc.

To achieve that we jump from (1,1) to (1,100), from (2,1) to (2,100) and so on.
If some values are deleted, we need to read backward. E.g. if (3,100) is
deleted, we need to return (3,99).

+---------------+---------------+---------------+---------------+
| | | | |
| 1,1 ... 1,100 | 2,1 ... 2,100 | 3,1 ... 3,100 | 4,1 ... 4,100 |
| | | | |
+---------------+---------------+---------------+---------------+

| ^ | ^ | ^ | ^
| | | | | | | |
+-------------+ +-------------+ +-------------+ +-------------+

If it happened that a whole value series is deleted, we return to the
previous value and need to detect such situation. E.g. if all the values
from (3,1) to (3,100) were deleted, we will return to (2,100).

+---------------+---------------+ +---------------+
| | | | |
| 1,1 ... 1,100 | 2,1 ... 2,100 |<--------------+ 4,1 ... 4,100 |
| | | | |
+---------------+---------------+ +---------------+
^
| ^ | ^ | ^ |
| | | | | | |
+-------------+ +-------------+ +-------------+ |
+-----------------------------+

This all is implemented inside _bt_skip. Unfortunately as I see it now the idea
of relying on ability to skip dead index tuples without checking a heap tuple
is not reliable, since an index tuple will be added into killedItems and can be
marked as dead only when not a single transaction can see it anymore.

Potentially there are two possible solutions:

* Adjust the code in nodeIndexOnlyscan to perform a proper visibility check and
understand if we returned back. Obviously it will make the patch more invasive.

* Reduce scope of the patch, and simply do not apply jumping in this case. This
means less functionality but hopefully still brings some value.

At this point me and Jesper inclined to go with the second option. But maybe
I'm missing something, are there any other suggestions?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-02-04 20:24:21 Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno
Previous Message Tom Lane 2020-02-04 18:22:54 Re: Index only scan and ctid