Re: Index Skip Scan

From: James Coleman <jtc331(at)gmail(dot)com>
To: a(dot)kuzmenkov(at)postgrespro(dot)ru
Cc: jesper(dot)pedersen(at)redhat(dot)com, 9erthalion6(at)gmail(dot)com, pg(at)bowt(dot)ie, thomas(dot)munro(at)enterprisedb(dot)com, bhushan(dot)uparkar(at)gmail(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2018-11-15 14:28:03
Message-ID: CAAaqYe_+q=iqDzd8cJsYpcG+4JtK1_zdv01EKtLJFZ4bcpmQHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Is skip scan only possible for index-only scan?

I haven't see much discussion of this question yet. Is there a
particular reason to lock ourselves into thinking about this only in
an index only scan?

>> I think we can improve this,
>> and the skip scan can be strictly faster than index scan regardless of
>> the data. As a first approximation, imagine that we somehow skipped
>> equal tuples inside _bt_next instead of sending them to the parent
>> Unique node. This would already be marginally faster than Unique + Index
>> scan. A more practical implementation would be to remember our position
>> in tree (that is, BTStack returned by _bt_search) and use it to skip
>> pages in bulk. This looks straightforward to implement for a tree that
>> does not change, but I'm not sure how to make it work with concurrent
>> modifications. Still, this looks a worthwhile direction to me, because
>> if we have a strictly faster skip scan, we can just use it always and
>> not worry about our unreliable statistics. What do you think?
>>
>
> This is something to look at -- maybe there is a way to use
> btpo_next/btpo_prev instead/too in order to speed things up. Atm we just
> have the scan key in BTScanOpaqueData. I'll take a look after my
> upcoming vacation; feel free to contribute those changes in the meantime
> of course.

It seems to me also that the logic necessary for this kind of
traversal has other useful applications. For example, it should be
possible to build on that logic to allow and index like t(owner_fk,
created_at) to be used to execute the following query:

select *
from t
where owner_fk in (1,2,3)
order by created_at
limit 25

without needing to fetch all tuples satisfying "owner_fk in (1,2,3)"
and subsequently sorting them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-11-15 15:20:33 Re: New Defects reported by Coverity Scan for PostgreSQL
Previous Message Robert Haas 2018-11-15 14:10:17 Re: [HACKERS] generated columns