Concurrency bug in UPDATE of partition-key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Concurrency bug in UPDATE of partition-key
Date: 2018-03-08 11:25:14
Message-ID: CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Mail subject changed; original thread : [1])

On 8 March 2018 at 11:57, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 8 March 2018 at 09:15, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>
>> CREATE TABLE pa_target (key integer, val text)
>> PARTITION BY LIST (key);
>> CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
>> CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);
>> INSERT INTO pa_target VALUES (1, 'initial1');
>>
>> session1:
>> BEGIN;
>> UPDATE pa_target SET val = val || ' updated by update1' WHERE key = 1;
>> UPDATE 1
>> postgres=# SELECT * FROM pa_target ;
>> key | val
>> -----+-----------------------------
>> 1 | initial1 updated by update1
>> (1 row)
>>
>> session2:
>> UPDATE pa_target SET val = val || ' updated by update2', key = key + 1 WHERE
>> key = 1
>> <blocks>
>>
>> session1:
>> postgres=# COMMIT;
>> COMMIT
>>
>> <session1 unblocks and completes its UPDATE>
>>
>> postgres=# SELECT * FROM pa_target ;
>> key | val
>> -----+-----------------------------
>> 2 | initial1 updated by update2
>> (1 row)
>>
>> Ouch. The committed updates by session1 are overwritten by session2. This
>> clearly violates the rules that rest of the system obeys and is not
>> acceptable IMHO.
>>
>> Clearly, ExecUpdate() while moving rows between partitions is missing out on
>> re-constructing the to-be-updated tuple, based on the latest tuple in the
>> update chain. Instead, it's simply deleting the latest tuple and inserting a
>> new tuple in the new partition based on the old tuple. That's simply wrong.
>
> You are right. This need to be fixed. This is a different issue than
> the particular one that is being worked upon in this thread, and both
> these issues have different fixes.
>

As discussed above, there is a concurrency bug where during UPDATE of
a partition-key, another transaction T2 updates the same row.

> Like you said, the tuple needs to be reconstructed when ExecDelete()
> finds that the row has been updated by another transaction. We should
> send back this information from ExecDelete() (I think tupleid
> parameter gets updated in this case), and then in ExecUpdate() we
> should goto lreplace, so that the the row is fetched back similar to
> how it happens when heap_update() knows that the tuple was updated.

The attached WIP patch is how the fix is turning out to be.
ExecDelete() passes back the epqslot returned by EvalPlanQual().
ExecUpdate() uses this re-fetched tuple to re-process the row, just
like it does when heap_update() returns HeapTupleUpdated.

I need to write an isolation test for this fix.

[1] : https://www.postgresql.org/message-id/CAJ3gD9d%3DwcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ%40mail.gmail.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
fix_concurrency_bug_WIP.patch application/octet-stream 3.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-03-08 11:33:03 Re: JIT compiling with LLVM v11
Previous Message Pavel Stehule 2018-03-08 10:21:10 Re: csv format for psql