RE: Index Skip Scan

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: "jesper(dot)pedersen(at)redhat(dot)com" <jesper(dot)pedersen(at)redhat(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "James Coleman" <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: RE: Index Skip Scan
Date: 2020-01-15 13:33:41
Message-ID: 1e59d800b38141c5af2ad3a64cf9758e@opammb0562.comp.optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below.

- I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me:
create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b, generate_series(1,10000) c); create index on t2 (a,b,c desc);

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc;
QUERY PLAN
---------------------------------------------------------------------------------
Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..216.25 rows=500 width=12)
Skip scan: true
Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
(3 rows)

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc;
QUERY PLAN
-----------------------------------------------------------------------------------------
Unique (cost=0.43..8361.56 rows=500 width=12)
-> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..8361.56 rows=9807 width=12)
Index Cond: ((a = 2) AND (b = 5))
(3 rows)

When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index scan even though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather than writing b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in the future. I think for the sake of consistency it would make sense if this eventually gets addressed.

- nodeIndexScan.c, line 126
This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the side-effects this has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)

- nbsearch.c, _bt_skip, line 1440
_bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately after that outside of the else. The second call can be removed I think.

- nbtsearch.c _bt_skip line 1597
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);

This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))

Is it allowed to look at the high key / low key of the page without have a read lock on it?

- nbtsearch.c line 1634
if (_bt_readpage(scan, indexdir, offnum)) ...
else
error()

Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition, not part of the scan key, makes none of the keys on the page match?

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where page splits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page check such that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursor scans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear example ready when it would go wrong. It may well be fine, but it was one of the things on my mind.

[1] https://postgrespro.com/list/id/1566683972147(dot)11682(at)Optiver(dot)com

-Floris

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mahendra Singh Thalor 2020-01-15 13:34:17 Re: [HACKERS] Block level parallel vacuum
Previous Message Imai Yoshikazu 2020-01-15 13:15:47 Re: [Proposal] Add accumulated statistics for wait event