Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Date: 2018-03-08 03:45:15
Message-ID: CABOikdMfs7Ec=i6fCqHr3qej_PmCOf6iHrNK9iUcypRtDCwD_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 28, 2018 at 12:38 PM, Rajkumar Raghuwanshi <
rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:

> On Wed, Feb 14, 2018 at 5:44 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
>> +# Concurrency error from GetTupleForTrigger
>> +# Concurrency error from ExecLockRows
>>
>> I think you don't need to mention above sentences in spec files.
>> Apart from that, your patch looks good to me. I have marked it as
>> Ready For Committer.
>>
>
> I too have tested this feature with isolation framework and this look good
> to me.
>
>
It looks to me that we are trying to fix only one issue here with
concurrent updates. What happens if a non-partition key is first updated
and then a second session updates the partition key?

For example, with your patches applied:

CREATE TABLE pa_target (key integer, val text)
PARTITION BY LIST (key);
CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);
INSERT INTO pa_target VALUES (1, 'initial1');

session1:
BEGIN;
UPDATE pa_target SET val = val || ' updated by update1' WHERE key = 1;
UPDATE 1
postgres=# SELECT * FROM pa_target ;
key | val
-----+-----------------------------
1 | initial1 *updated by update1*
(1 row)

session2:
UPDATE pa_target SET val = val || ' updated by update2', key = key + 1
WHERE key = 1
<blocks>

session1:
postgres=# COMMIT;
COMMIT

<session1 unblocks and completes its UPDATE>

postgres=# SELECT * FROM pa_target ;
key | val
-----+-----------------------------
2 | initial1 updated by update2
(1 row)

Ouch. The committed updates by session1 are overwritten by session2. This
clearly violates the rules that rest of the system obeys and is not
acceptable IMHO.

Clearly, ExecUpdate() while moving rows between partitions is missing out
on re-constructing the to-be-updated tuple, based on the latest tuple in
the update chain. Instead, it's simply deleting the latest tuple and
inserting a new tuple in the new partition based on the old tuple. That's
simply wrong.

I haven't really thought carefully to see if this should be a separate
patch, but it warrants attention. We should at least think through all
different concurrency aspects of partition key updates and think about a
holistic solution, instead of fixing one problem at a time. This probably
also shows that isolation tests for partition key updates are either
missing (I haven't checked) or they need more work.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-03-08 03:47:48 Re: Protect syscache from bloating with negative cache entries
Previous Message Robert Haas 2018-03-08 03:30:24 Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.