Re: Index Skip Scan

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Floris Van Nee <florisvannee(at)optiver(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, James Coleman <jtc331(at)gmail(dot)com>
Subject: Re: Index Skip Scan
Date: 2019-06-24 13:04:10
Message-ID: 20190624130410.rhor2lcffdvrh7ka@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2019 at 01:44:14PM +0200, Dmitry Dolgov wrote:
>> On Sun, Jun 23, 2019 at 1:04 PM Floris Van Nee <florisvannee(at)optiver(dot)com> wrote:
>>
>> However, _bt_readpage messes things up, because it only reads tuples that
>> match all the provided keys (so where b=2)
>
>Right, the problem you've reported first had a similar origins. I'm starting to
>think that probably using _bt_readpage just like that is not exactly right
>thing to do, since the correct element is already found and there is no need to
>check if tuples are matching after one step back. I'll try to avoid it in the
>next version of patch.
>
>> I was wondering about something else: don't we also have another problem with
>> updating this current index tuple by skipping before calling
>> btgettuple/_bt_next? I see there's some code in btgettuple to kill dead tuples
>> when scan->kill_prior_tuple is true. I'm not too familiar with the concept of
>> killing dead tuples while doing index scans, but by looking at the code it
>> seems to be possible that btgettuple returns a tuple, caller processes it and
>> sets kill_prior_tuple to true in order to have it killed. However, then the
>> skip scan kicks in, which sets the current tuple to a completely different
>> tuple. Then, on the next call of btgettuple, the wrong tuple gets killed. Is my
>> reasoning correct here or am I missing something?
>
>Need to check, but probably we can avoid that by setting kill_prior_tuple to
>false in case of skip scan as in index_rescan.
>
>> On Sun, Jun 23, 2019 at 3:10 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> I've done some initial review on v20 - just reading through the code, no
>> tests at this point. Here are my comments:
>
>Thank you!
>
>> 2) indices.sgml
>>
>> The new section is somewhat unclear and difficult to understand, I think
>> it'd deserve a rewording. Also, I wonder if we really should link to the
>> wiki page about FSM problems. We have a couple of wiki links in the sgml
>> docs, but those seem more generic while this seems as a development page
>> that might disapper. But more importantly, that wiki page does not say
>> anything about "Loose Index scans" so is it even the right wiki page?
>
>Wow, indeed, looks like it's a totally wrong reference. I think Kyotaro already
>mentioned it too, so probably I'm going to remove it (and instead describe the
>idea in a few words in the documentation itself).
>
>> 6) nodeIndexScan.c
>>
>> I wonder why we even add and initialize the ioss_ fields for IndexScan
>> nodes, when the skip scans require index-only scans?
>
>Skip scans required index-only scans until recently, when the patch was updated
>to incorporate the same approach for index scans too. My apologies, looks like
>documentation and some commentaries are still inconsistent about this topic.
>

Yes, if that's the case then various bits of docs and comments are rather
misleading, ant fields in IndexScanState should be named 'iss_'.

>> 7) pathnode.c
>>
>> I wonder how much was the costing discussed. It seems to me the logic is
>> fairly similar to ideas discussed in the incremental sort patch, and
>> we've been discussing some weak points there. I'm not sure how much we
>> need to consider those issues here.
>
>Can you please elaborate in a few words, which issues do you mean? Is it about
>non uniform distribution of distinct values? If so, I believe it's partially
>addressed when we have to skip too often, by searching a next index page.
>Although yeah, there is still an assumption about uniform distribution of
>distinct groups at the planning time.
>

Right, it's mostly about what happens when the group sizes are not close
to average size. The question is what happens in such cases - how much
slower will the plan be, compared to "current" plan without a skip scan?

I don't have a very good idea of the additional overhead associated with
skip-scans - presumably it's a bit more expensive, right?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2019-06-24 13:54:01 Re: Choosing values for multivariate MCV lists
Previous Message James Coleman 2019-06-24 12:59:16 Re: [PATCH] Incremental sort (was: PoC: Partial sort)