Re: Parallel Index Scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2017-02-10 17:57:40
Message-ID: CA+Tgmob8x02wSWtD7W8YF0r8PJBN7WQjcDRk1sLU0F9i1rX4AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 1, 2017 at 8:20 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> The hunk in indexam.c looks like a leftover that can be omitted.
>
> It is not a leftover hunk. Earlier, the patch has the same check
> btparallelrescan, but based on your comment up thread [1] (Why does
> btparallelrescan cater to the case where scan->parallel_scan== NULL?),
> this has been moved to indexam.c.

That's not my point. The point is, if it's not a parallel scan,
index_parallelrescan() should never get called in the first place. So
therefore it shouldn't need to worry about scan->parallel_scan being
NULL. It seems possibly reasonable to put something like
Assert(scan->parallel_scan != NULL) in there, but arranging to return
without doing anything in that case seems like it just masks possible
bugs in the calling code. And really I'm not sure we even need the
Assert.

>> I am a bit mystified about how this manages to work with array keys.
>> _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
>> unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
>> _bt_parallel_advance_scan() won't do anything unless btps_pageStatus
>> is already BTPARALLEL_DONE.
>
> This is just to ensure that btps_arrayKeyCount is advanced once and
> btps_pageStatus is changed to BTPARALLEL_DONE once per array element.
> So it goes something like, if we have array with values [1,2,3], then
> all the workers will complete the scan with key 1 and one of them will
> mark btps_pageStatus as BTPARALLEL_DONE and then first one to hit
> _bt_parallel_advance_scan will increment the value of
> btps_arrayKeyCount, then same will happen for key 2 and 3. It is
> quite possible that by the time one of the participant advances it
> local key, the scan for that key is already finished and we handle
> that situation in _bt_parallel_seize() with below check:
>
> if (so->arrayKeyCount < btscan->btps_arrayKeyCount)
> *status = false;
>
> I think Rahila has also mentioned something on above lines, let us
> know if we are missing something here? Do you want to add more
> comments in the code to explain this handling, if yes, then where (on
> top of function _bt_parallel_advance_scan)?

You know, I just misread this code. Both of you are right, and I am wrong.

>> That
>> is a little odd. Another, possibly related thing that is odd is that
>> when _bt_steppage() finds no tuples and decides to advance to a new
>> page again, there's a very short comment in the forward case and a
>> very long comment in the backward case:
>>
>> /* nope, keep going */
>> vs.
>> /*
>> * For parallel scans, get the last page scanned as it is quite
>> * possible that by the time we try to fetch previous page, other
>> * worker has also decided to scan that previous page. We could
>> * avoid that by doing _bt_parallel_release once we have read the
>> * current page, but it is bad to make other workers wait till we
>> * read the page.
>> */
>>
>> Now it seems to me that these cases are symmetric and the issues are
>> identical. It's basically that, while we were judging whether the
>> current page has useful contents, some other process could have
>> advanced the scan (since _bt_readpage un-seized it).
>
> Yeah, but the reason of difference in comments is that for
> non-parallel backwards scans there is no code at that place to move to
> previous page and it basically relies on next call to _bt_walk_left()
> whereas for parallel-scans, we can't simply rely on _bt_walk_left().
> I have slightly modified the comments for backward scan case, see if
> that looks better and if not, then suggest what you think is better.

Why can't we rely on _bt_walk_left?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-02-10 18:01:03 Re: WIP: Faster Expression Processing and Tuple Deforming (including JIT)
Previous Message Robert Haas 2017-02-10 17:34:50 Re: Parallel Index Scans