Re: Fix for parallel BTree initialization bug

From: "Jameson, Hunter 'James'" <hunjmes(at)amazon(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix for parallel BTree initialization bug
Date: 2020-09-10 18:09:19
Message-ID: FA67FF11-8A3C-4D7C-AE4B-A2025CBC67C1@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Answers inline below:

On 9/10/20, 4:58 AM, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Tue, Sep 8, 2020 at 11:55 PM Jameson, Hunter 'James'
<hunjmes(at)amazon(dot)com> wrote:
>
> Hi, I ran across a small (but annoying) bug in initializing parallel BTree scans, which causes the parallel-scan state machine to get confused. The fix is one line; the description is a bit longer—
>
>
>
> Before, function _bt_first() would exit immediately if the specified scan keys could never be satisfied--without notifying other parallel workers, if any, that the scan key was done.
>

The first question that comes to mind is how is it possible that for
one of the workers specified scan keys is not satisfied while for
others it is satisfied? I think it is possible when other workers are
still working on the previous scan key and this worker has moved to
the next scan key. If not, then what is the other case?

I think that's right. If I remember correctly, the first to move to the next IN-list condition exits early and *locally* moves on to the next-next IN-list condition, but doesn't properly advance the global scan key. At that point, "By allowing the first worker to move on to the next scan key, in this one case, without notifying other workers, the global key ends up < the first worker's local key." So the first worker now has a local scan key > the global scan key, because it didn't call _bt_parallel_done().

> This moved that particular worker to a scan key beyond what was in the shared parallel-query state, so that it would later try to read in "InvalidBlockNumber", without recognizing it as a special sentinel value.
>

Now, if it happens as I mentioned then the other workers should not
try to advance their scan because their local scan key will be lesser
than shared key. Basically, they should return from the below
condition:
_bt_parallel_seize()
{
..
if (so->arrayKeyCount < btscan->btps_arrayKeyCount)
{
/* Parallel scan has already advanced to a new set of scankeys. */
status = false;
}
..
}

After this, those workers will also update their scan key and move
forward from there. So, I am not seeing how this could create a
problem.

I think, if I understand my notes on the bug, that the problem is with the first worker, not the other workers. So it doesn't matter if the other workers aren't confused, because the first worker confuses itself. The first worker has moved on, without telling anyone else, basically.

--
With Regards,
Amit Kapila.

Thanks,
James
--
James Hunter, Amazon Web Services (AWS)

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-09-10 18:26:59 Re: SIGQUIT handling, redux
Previous Message Julien Rouhaud 2020-09-10 18:06:10 Re: Online checksums verification in the backend