Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

From: Andres Freund <andres(at)anarazel(dot)de>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Date: 2018-03-05 23:23:53
Message-ID: 20180305232353.gpue7jldnm4bjf4i@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-02-13 12:41:26 +0530, amul sul wrote:
> From 08c8c7ece7d9411e70a780dbeed89d81419db6b6 Mon Sep 17 00:00:00 2001
> From: Amul Sul <sulamul(at)gmail(dot)com>
> Date: Tue, 13 Feb 2018 12:37:52 +0530
> Subject: [PATCH 1/2] Invalidate ip_blkid v5
>
> v5:
> - Added code in heap_mask to skip wal_consistency_checking[7]
> - Fixed previous todos.
>
> v5-wip2:
> - Minor changes in RelationFindReplTupleByIndex() and
> RelationFindReplTupleSeq()
>
> - TODO;
> Same as the privious
>
> v5-wip: Update w.r.t Amit Kapila's comments[6].
> - Reverted error message in nodeModifyTable.c from 'tuple to be locked'
> to 'tuple to be updated'.
>
> - TODO:
> 1. Yet to made a decision of having LOG/ELOG/ASSERT in the
> RelationFindReplTupleByIndex() and RelationFindReplTupleSeq().
>
> v4: Rebased on "UPDATE of partition key v35" patch[5].
>
> v3: Update w.r.t Amit Kapila's[3] & Alvaro Herrera[4] comments
> - typo in all error message and comment : "to an another" -> "to another"
> - error message change : "tuple to be updated" -> "tuple to be locked"
> - In ExecOnConflictUpdate(), error report converted into assert &
> comments added.
>
> v2: Updated w.r.t Robert review comments[2]
> - Updated couple of comment of heap_delete argument and ItemPointerData
> - Added same concurrent update error logic in ExecOnConflictUpdate,
> RelationFindReplTupleByIndex and RelationFindReplTupleSeq
>
> v1: Initial version -- as per Amit Kapila's suggestions[1]
> - When tuple is being moved to another partition then ip_blkid in the
> tuple header mark to InvalidBlockNumber.

Very nice and instructive to keep this in a submission's commit message.

> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 8a846e7dba..f4560ee9cb 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -3037,6 +3037,8 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
> * crosscheck - if not InvalidSnapshot, also check tuple against this
> * wait - true if should wait for any conflicting update to commit/abort
> * hufd - output parameter, filled in failure cases (see below)
> + * row_moved - true iff the tuple is being moved to another partition
> + * table due to an update of partition key. Otherwise, false.
> *

I don't think 'row_moved' is a good variable name for this. Moving a row
in our heap format can mean a lot of things. Maybe 'to_other_part' or
'changing_part'?

> + /*
> + * Sets a block identifier to the InvalidBlockNumber to indicate such an
> + * update being moved tuple to another partition.
> + */
> + if (row_moved)
> + BlockIdSet(&((tp.t_data->t_ctid).ip_blkid), InvalidBlockNumber);

The parens here are set in a bit werid way. I assume that's from copying
it out of ItemPointerSet()? Why aren't you just using ItemPointerSetBlockNumber()?

I think it'd be better if we followed the example of specultive inserts
and created an equivalent of HeapTupleHeaderSetSpeculativeToken. That'd
be a heck of a lot easier to grep for...

> @@ -9314,6 +9323,18 @@ heap_mask(char *pagedata, BlockNumber blkno)
> */
> if (HeapTupleHeaderIsSpeculative(page_htup))
> ItemPointerSet(&page_htup->t_ctid, blkno, off);
> +
> + /*
> + * For a deleted tuple, a block identifier is set to the

I think this 'the' is superflous.

> + * InvalidBlockNumber to indicate that the tuple has been moved to
> + * another partition due to an update of partition key.

But I think it should be 'the partition key'.

> + * Like speculative tuple, to ignore any inconsistency set block
> + * identifier to current block number.

This doesn't quite parse.

> + */
> + if (!BlockNumberIsValid(
> + BlockIdGetBlockNumber((&((page_htup->t_ctid).ip_blkid)))))
> + BlockIdSet(&((page_htup->t_ctid).ip_blkid), blkno);
> }

That formatting looks wrong. I think it should be replaced by a macro
like mentioned above.

> /*
> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 160d941c00..a770531e14 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c
> @@ -3071,6 +3071,11 @@ ltrmark:;
> ereport(ERROR,
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> errmsg("could not serialize access due to concurrent update")));
> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("tuple to be locked was already moved to another partition due to concurrent update")));

Yes, given that we repeat this in multiple places, I *definitely* want
to see this wrapped in a macro with a descriptive name.

> diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
> index 7961b4be6a..b07b7092de 100644
> --- a/src/backend/executor/nodeLockRows.c
> +++ b/src/backend/executor/nodeLockRows.c
> @@ -218,6 +218,11 @@ lnext:
> ereport(ERROR,
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> errmsg("could not serialize access due to concurrent update")));
> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
> +

Why are we using ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE rather than
ERRCODE_T_R_SERIALIZATION_FAILURE? A lot of frameworks have builtin
logic to retry serialization failures, and this kind of thing is going
to resolved by retrying, no?

> diff --git a/src/test/isolation/expected/partition-key-update-1.out b/src/test/isolation/expected/partition-key-update-1.out
> new file mode 100644

I'd like to see tests that show various interactions with ON CONFLICT.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-03-05 23:32:31 Re: [HACKERS] Creating backup history files for backups taken from standbys
Previous Message Thomas Munro 2018-03-05 23:22:54 Re: Weird failures on lorikeet