Re: Index Skip Scan

From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: Floris Van Nee <florisvannee(at)Optiver(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2020-01-20 19:01:20
Message-ID: 802e46fa-dc2c-4c32-e55b-884e04aa7f12@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Floris,

On 1/15/20 8:33 AM, Floris Van Nee wrote:
> I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below.
>

Thanks for your review !

> - I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me:
> create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b, generate_series(1,10000) c); create index on t2 (a,b,c desc);
>
> postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc;
> QUERY PLAN
> ---------------------------------------------------------------------------------
> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..216.25 rows=500 width=12)
> Skip scan: true
> Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
> (3 rows)
>
> postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc;
> QUERY PLAN
> -----------------------------------------------------------------------------------------
> Unique (cost=0.43..8361.56 rows=500 width=12)
> -> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..8361.56 rows=9807 width=12)
> Index Cond: ((a = 2) AND (b = 5))
> (3 rows)
>
> When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index scan even though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather than writing b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in the future. I think for the sake of consistency it would make sense if this eventually gets addressed.
>

Agreed, that it is an improvement that should be made. I would like
David's view on this since it relates to the UniqueKey patch.

> - nodeIndexScan.c, line 126
> This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the side-effects this has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)
>

Correct - fixed.

> - nbsearch.c, _bt_skip, line 1440
> _bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately after that outside of the else. The second call can be removed I think.
>

Yes, removed the "else" call site.

> - nbtsearch.c _bt_skip line 1597
> LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
> scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);
>
> This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?
>

Yes, that needed to be changed.

> - nbtsearch.c _bt_skip line 1440
> if (BTScanPosIsValid(so->currPos) &&
> _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
>
> Is it allowed to look at the high key / low key of the page without have a read lock on it?
>

In case of a split the page will still contain a high key and a low key,
so this should be ok.

> - nbtsearch.c line 1634
> if (_bt_readpage(scan, indexdir, offnum)) ...
> else
> error()
>
> Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition, not part of the scan key, makes none of the keys on the page match?
>

The logic for this has been changed.

> - nbtsearch.c in general
> Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where page splits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page check such that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursor scans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear example ready when it would go wrong. It may well be fine, but it was one of the things on my mind.
>

There is a BT_READ lock in place when finding the correct leaf page, or
searching within the leaf page itself. _bt_vacuum_one_page deletes only
LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
you have some feedback for this ?

Please, find the updated patches attached that Dmitry and I made.

Thanks again !

Best regards,
Jesper

Attachment Content-Type Size
v31_0001-Unique-key.patch text/x-patch 25.3 KB
v31-0002-Index-skip-scan.patch text/x-patch 69.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-20 19:03:29 Re: [HACKERS] kqueue
Previous Message Andres Freund 2020-01-20 19:00:14 Re: Physical replication slot advance is not persistent