Re: Parallel Index Scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2017-02-13 12:17:22
Message-ID: CA+Tgmobq-NDTaVbRftiQOM4wSXCAQWbU7SpV9TEP-yhdHyY8bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>> Why can't we rely on _bt_walk_left?
>>> The reason is mentioned in comments, but let me try to explain with
>>> some example. When you reach that point of code, it means that either
>>> the current page (assume page number is 10) doesn't contain any
>>> matching items or it is a half-dead page, both of which indicates that
>>> we have to move to the previous page. Now, before checking if the
>>> current page contains matching items, we signal parallel machinery
>>> (via _bt_parallel_release) to allow workers to read the previous page
>>> (assume previous page number is 9). So it is quite possible that
>>> after deciding that current page (page number 10) doesn't contain any
>>> matching tuples if we directly move to the previous page (in this case
>>> it will be 9) by using _bt_walk_left, some other worker would have
>>> read page 9. In short, if we directly use _bt_walk_left(), then we
>>> are prone to returning some of the values twice as multiple workers
>>> can read the same page.
>> But ... the entire point of the seize-and-release stuff is to avoid
>> this problem. You're suppose to seize the scan, read the current
>> page, walk left, store the page you find in the scan, and then release
>> the scan.
> Exactly and that is what is done in the patch. Basically, if we found
> that the current page is half-dead or it doesn't contain any matching
> items, then release the current buffer, seize the scan, read the
> current page, walk left and so on. I am slightly confused here
> because it seems both of us agree on what is the right thing to do and
> according to me that is how it is implemented. Are you just ensuring
> about whether I have implemented as discussed or do you see a problem
> with the way it is implemented?

Well, before, I thought you said that relying entirely on
_bt_walk_left couldn't work because then two people might end up
running it at the same time, and that would cause problems. But if
you can only run _bt_walk_left while you've got the scan seized, then
that can't happen. Evidently I'm missing something here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2017-02-13 12:34:28 Re: Should we cacheline align PGXACT?
Previous Message Amit Khandekar 2017-02-13 12:01:56 UPDATE of partition key