Re: Parallel Index Scans

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2017-02-01 07:40:06
Message-ID: CAH2L28t3w+XNyjfnqoeiY3hOF=uQ4YuwmUSoKs96M7Xb3bFtOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Robert,

>I am a bit mystified about how this manages to work with array keys.
>_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
>unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
>_bt_parallel_advance_scan() won't do anything unless btps_pageStatus
>is already BTPARALLEL_DONE. It seems like one of those two things has
>to be wrong.

btps_pageStatus is to be set to BTPARALLEL_DONE only by the first worker
which is
performing scan for latest array key and which has encountered end of scan.
This is ensured by
following check in _bt_parallel_done(),

if (so->arrayKeyCount >= btscan->btps_arrayKeyCount &&
btscan->btps_pageStatus != BTPARALLEL_DONE)

Thus, BTPARALLEL_DONE marks end of scan only for latest array keys. This
ensures that when any worker reaches
_bt_advance_array_keys() it advances latest scan which is marked by
btscan->btps_arrayKeyCount only when latest scan
has ended by checking if(btps_pageStatus == BTPARALLEL_DONE) in
_bt_advance_array_keys(). Otherwise, the worker just
advances its local so->arrayKeyCount.

I hope this provides some clarification.

Thank you,
Rahila Syed

On Wed, Feb 1, 2017 at 3:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >> In spite of being careful, I missed reorganizing the functions in
> >> genam.h which I have done in attached patch.
> >
> > Cool. Committed parallel-generic-index-scan.2.patch. Thanks.
>
> Reviewing parallel_index_scan_v6.patch:
>
> I think it might be better to swap the return value and the out
> parameter for _bt_parallel_seize; that is, return a bool, and have
> callers ignore the value of the out parameter (e.g. *pageno).
>
> I think _bt_parallel_advance_scan should be renamed something that
> includes the word "keys", like _bt_parallel_advance_array_keys.
>
> The hunk in indexam.c looks like a leftover that can be omitted.
>
> +/*
> + * Below flags are used to indicate the state of parallel scan.
>
> They aren't really flags any more; they're members of an enum. I
> think you could just leave this sentence out entirely and start right
> in with descriptions of the individual values. But maybe all of those
> descriptions should end in a period (instead of one ending in a period
> but not the other three) since they read like complete sentences.
>
> + * btinitparallelscan - Initializing BTParallelScanDesc for parallel
> btree scan
>
> Initializing -> initialize
>
> + * btparallelrescan() -- reset parallel scan
>
> Previous two prototypes have one dash, this one has two. Make it
> consistent, please.
>
> + * Ideally, we don't need to acquire spinlock here, but being
> + * consistent with heap_rescan seems to be a good idea.
>
> How about: In theory, we don't need to acquire the spinlock here,
> because there shouldn't be any other workers running at this point,
> but we do so for consistency.
>
> + * _bt_parallel_seize() -- returns the next block to be scanned for
> forward
> + * scans and latest block scanned for backward scans.
>
> I think the description should be more like "Begin the process of
> advancing the scan to a new page. Other scans must wait until we call
> bt_parallel_release() or bt_parallel_done()." And likewise
> _bt_parallel_release() should say something like "Complete the process
> of advancing the scan to a new page. We now have the new value for
> btps_scanPage; some other backend can now begin advancing the scan."
> And _bt_parallel_done should say something like "Mark the parallel
> scan as complete."
>
> I am a bit mystified about how this manages to work with array keys.
> _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
> unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
> _bt_parallel_advance_scan() won't do anything unless btps_pageStatus
> is already BTPARALLEL_DONE. It seems like one of those two things has
> to be wrong.
>
> _bt_readpage's header comment should be updated to note that, in the
> case of a parallel scan, _bt_parallel_seize should have been called
> prior to entering this function, and _bt_parallel_release will be
> called prior to return (or this could alternatively be phrased in
> terms of btps_pageStatus on entry/exit).
>
> _bt_readnextpage isn't very clear about the meaning of its blkno
> argument. It looks like it always has to be valid when scanning
> forward, but only in the parallel case when scanning backward? That
> is a little odd. Another, possibly related thing that is odd is that
> when _bt_steppage() finds no tuples and decides to advance to a new
> page again, there's a very short comment in the forward case and a
> very long comment in the backward case:
>
> /* nope, keep going */
> vs.
> /*
> * For parallel scans, get the last page scanned as it is quite
> * possible that by the time we try to fetch previous page,
> other
> * worker has also decided to scan that previous page. We
> could
> * avoid that by doing _bt_parallel_release once we have read
> the
> * current page, but it is bad to make other workers wait till
> we
> * read the page.
> */
>
> Now it seems to me that these cases are symmetric and the issues are
> identical. It's basically that, while we were judging whether the
> current page has useful contents, some other process could have
> advanced the scan (since _bt_readpage un-seized it).
>
> - /* check for deleted page */
> page = BufferGetPage(so->currPos.buf);
> TestForOldSnapshot(scan->xs_snapshot, rel, page);
> opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> + /* check for deleted page */
>
> This is an independent change; committed.
>
> What kind of testing has been done to ensure that this doesn't have
> concurrency bugs? What's the highest degree of parallelism that's
> been tested? Did that test include array keys?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-02-01 07:41:38 Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?
Previous Message Thomas Munro 2017-02-01 07:23:45 Re: Parallel tuplesort (for parallel B-Tree index creation)