Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

From: amul sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Date: 2018-03-09 09:32:21
Message-ID: CAAJ_b96mw5xn5oSQgxpgn5dWFRs1j7OebpHRmXkdSNY+70yYEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

Thanks for your time and the review comments/suggestions.

On Tue, Mar 6, 2018 at 4:53 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-02-13 12:41:26 +0530, amul sul wrote:
>> From 08c8c7ece7d9411e70a780dbeed89d81419db6b6 Mon Sep 17 00:00:00 2001
>> From: Amul Sul <sulamul(at)gmail(dot)com>
>> Date: Tue, 13 Feb 2018 12:37:52 +0530
>> Subject: [PATCH 1/2] Invalidate ip_blkid v5
[....]
>
> Very nice and instructive to keep this in a submission's commit message.
>

Thank you.

>
>
>> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
>> index 8a846e7dba..f4560ee9cb 100644
>> --- a/src/backend/access/heap/heapam.c
>> +++ b/src/backend/access/heap/heapam.c
>> @@ -3037,6 +3037,8 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
>> * crosscheck - if not InvalidSnapshot, also check tuple against this
>> * wait - true if should wait for any conflicting update to commit/abort
>> * hufd - output parameter, filled in failure cases (see below)
>> + * row_moved - true iff the tuple is being moved to another partition
>> + * table due to an update of partition key. Otherwise, false.
>> *
>
> I don't think 'row_moved' is a good variable name for this. Moving a row
> in our heap format can mean a lot of things. Maybe 'to_other_part' or
> 'changing_part'?
>

Okay, renamed to 'changing_part' in the attached version.

>
>> + /*
>> + * Sets a block identifier to the InvalidBlockNumber to indicate such an
>> + * update being moved tuple to another partition.
>> + */
>> + if (row_moved)
>> + BlockIdSet(&((tp.t_data->t_ctid).ip_blkid), InvalidBlockNumber);
>
> The parens here are set in a bit werid way. I assume that's from copying
> it out of ItemPointerSet()? Why aren't you just using ItemPointerSetBlockNumber()?
>
>
> I think it'd be better if we followed the example of specultive inserts
> and created an equivalent of HeapTupleHeaderSetSpeculativeToken. That'd
> be a heck of a lot easier to grep for...
>

Added HeapTupleHeaderValidBlockNumber, HeapTupleHeaderSetBlockNumber and
ItemPointerValidBlockNumber macro, but not exactly same as the
HeapTupleHeaderSetSpeculativeToken. Do let me know your thoughts/suggestions.

>
>> @@ -9314,6 +9323,18 @@ heap_mask(char *pagedata, BlockNumber blkno)
>> */
>> if (HeapTupleHeaderIsSpeculative(page_htup))
>> ItemPointerSet(&page_htup->t_ctid, blkno, off);
>> +
>> + /*
>> + * For a deleted tuple, a block identifier is set to the
>
> I think this 'the' is superflous.
>

Fixed in the attached version.

>
>> + * InvalidBlockNumber to indicate that the tuple has been moved to
>> + * another partition due to an update of partition key.
>
> But I think it should be 'the partition key'.
>

Fixed in the attached version.

>
>> + * Like speculative tuple, to ignore any inconsistency set block
>> + * identifier to current block number.
>
> This doesn't quite parse.
>

Tried to explain a little bit more, any help or suggestion to improve it
further will be appreciated.

>
>> + */
>> + if (!BlockNumberIsValid(
>> + BlockIdGetBlockNumber((&((page_htup->t_ctid).ip_blkid)))))
>> + BlockIdSet(&((page_htup->t_ctid).ip_blkid), blkno);
>> }
>
> That formatting looks wrong. I think it should be replaced by a macro
> like mentioned above.
>

Used HeapTupleHeaderValidBlockNumber & HeapTupleHeaderSetBlockNumber
macro in the attached version.

>
>> /*
>> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
>> index 160d941c00..a770531e14 100644
>> --- a/src/backend/commands/trigger.c
>> +++ b/src/backend/commands/trigger.c
>> @@ -3071,6 +3071,11 @@ ltrmark:;
>> ereport(ERROR,
>> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>> errmsg("could not serialize access due to concurrent update")));
>> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
>
> Yes, given that we repeat this in multiple places, I *definitely* want
> to see this wrapped in a macro with a descriptive name.
>

Used ItemPointerValidBlockNumber macro all such places.

>
>> diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
>> index 7961b4be6a..b07b7092de 100644
>> --- a/src/backend/executor/nodeLockRows.c
>> +++ b/src/backend/executor/nodeLockRows.c
>> @@ -218,6 +218,11 @@ lnext:
>> ereport(ERROR,
>> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>> errmsg("could not serialize access due to concurrent update")));
>> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
>> +
>
> Why are we using ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE rather than
> ERRCODE_T_R_SERIALIZATION_FAILURE? A lot of frameworks have builtin
> logic to retry serialization failures, and this kind of thing is going
> to resolved by retrying, no?
>

No change, any comments on Amit's response[1]

>
>> diff --git a/src/test/isolation/expected/partition-key-update-1.out b/src/test/isolation/expected/partition-key-update-1.out
>> new file mode 100644
>
> I'd like to see tests that show various interactions with ON CONFLICT.
>

I've added isolation test for ON CONFLICT DO NOTHING case only, ON CONFLICT DO
UPDATE is yet to support for a partitioned table[2]. But one can we do that
with update row movement if can test ON CONFLICT DO UPDATE on the leaf table,
like attached TRIAL-on-conflict-do-update-wip.patch, thoughts?

In addition, I have added invalid block number check at the few places, as
discussed in [3]. Also, added check in heap_lock_updated_tuple,
rewrite_heap_tuple & EvalPlanQualFetch where ItemPointerEquals() used
to conclude tuple has been updated/deleted, but yet to figure out the
way to hit this changes manually, so that marking the patch as wip.

Regards,
Amul

1] https://postgr.es/m/CAA4eK1KFfm4PBbshNSikdru3Qpt8hUoKQWtBYjdVE2R7U9f6iA@mail.gmail.com
2] https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql
3] https://postgr.es/m/CAAJ_b97BBkRWFowGRs9VNzFykoK0ikGB1yYEsWfYK8xR5enSrw@mail.gmail.com

Attachment Content-Type Size
0001-Invalidate-ip_blkid-v6-wip.patch application/octet-stream 18.7 KB
0002-isolation-tests-v5.patch application/octet-stream 22.6 KB
TRIAL-on-conflict-do-update-wip.patch application/octet-stream 5.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2018-03-09 09:48:43 Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Previous Message Kyotaro HORIGUCHI 2018-03-09 08:40:01 Re: Protect syscache from bloating with negative cache entries