Fix for parallel BTree initialization bug

From: "Jameson, Hunter 'James'" <hunjmes(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Fix for parallel BTree initialization bug
Date: 2020-09-08 18:25:03
Message-ID: 4248CABC-25E3-4809-B4D0-128E1BAABC3C@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I ran across a small (but annoying) bug in initializing parallel BTree scans, which causes the parallel-scan state machine to get confused. The fix is one line; the description is a bit longer—

Before, function _bt_first() would exit immediately if the specified scan keys could never be satisfied--without notifying other parallel workers, if any, that the scan key was done. This moved that particular worker to a scan key beyond what was in the shared parallel-query state, so that it would later try to read in "InvalidBlockNumber", without recognizing it as a special sentinel value.

The basic bug is that the BTree parallel query state machine assumes that a worker process is working on a key <= the global key--a worker process can be behind (i.e., hasn't finished its work on a previous key), but never ahead. By allowing the first worker to move on to the next scan key, in this one case, without notifying other workers, the global key ends up < the first worker's local key.

Symptoms of the bug are: on R/O, we get an error saying we can't extend the index relation, while on an R/W we just extend the index relation by 1 block.

To reproduce, you need a query that:

1. Executes parallel BTree index scan;
2. Has an IN-list of size > 1;
3. Has an additional index filter that makes it impossible to satisfy the
first IN-list condition.

(We encountered such a query, and therefore the bug, on a production instance.)

Thanks,
James

--
James Hunter, Amazon Web Services (AWS)

Attachment Content-Type Size
0001-Fix-initialization-of-parallel-BTree-scan.patch application/octet-stream 1.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2020-09-08 18:33:59 Re: Yet another fast GiST build
Previous Message Andres Freund 2020-09-08 18:20:05 Re: [Patch] ALTER SYSTEM READ ONLY