Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-04-04 11:11:41
Message-ID: CAJ3gD9cOK19LsjFBSBdzN4=pOAn2=mk5+2OEba722ZRRj3SkhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 April 2017 at 17:13, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hi Amit,
>
> Thanks for updating the patch. Since ddl.sgml got updated on Saturday,
> patch needs a rebase.

Rebased now.

>
>> On 31 March 2017 at 16:54, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>> On 31 March 2017 at 14:04, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> On 2017/03/28 19:12, Amit Khandekar wrote:
>>>>> In the section 5.11 "Partitioning" => subsection 5 "Caveats", I have
>>>>> removed the caveat of not being able to update partition key. And it
>>>>> is now replaced by the caveat where an update/delete operations can
>>>>> silently miss a row when there is a concurrent UPDATE of partition-key
>>>>> happening.
>>>>
>>>> Hmm, how about just removing the "partition-changing updates are
>>>> disallowed" caveat from the list on the 5.11 Partitioning page and explain
>>>> the concurrency-related caveats on the UPDATE reference page?
>>>
>>> IMHO this caveat is better placed in Partitioning chapter to emphasize
>>> that it is a drawback specifically in presence of partitioning.
>
> I mean we fixed things for declarative partitioning so it's no longer a
> caveat like it is for partitioning implemented using inheritance (in that
> the former doesn't require user-defined triggers to implement
> row-movement). Seeing the first sentence, that is:
>
> An <command>UPDATE</> causes a row to move from one partition to another
> if the new value of the row fails to satisfy the implicit partition
> constraint of the original partition but there is another partition which
> can fit this row.
>
> which clearly seems to suggest that row-movement, if required, is handled
> by the system. So it's not clear why it's in this list. If we want to
> describe the limitations of the current implementation, we'll need to
> rephrase it a bit.

Yes I agree.

> How about something like:
> For an <command>UPDATE</> that causes a row to move from one partition to
> another due the partition key being updated, the following caveats exist:
> <a brief description of the possibility of surprising results in the
> presence of concurrent manipulation of the row in question>

Now with the slightly changed doc structuring for partitioning in
latest master, I have described in the end of section "5.10.2.
Declarative Partitioning" this note :

---

"Updating the partition key of a row might cause it to be moved into a
different partition where this row satisfies its partition
constraint."

---

And then in the Limitations section, I have replaced the earlier
can't-update-partition-key limitation with this new limitation as
below :

"When an UPDATE causes a row to move from one partition to another,
there is a chance that another concurrent UPDATE or DELETE misses this
row. Suppose, during the row movement, the row is still visible for
the concurrent session, and it is about to do an UPDATE or DELETE
operation on the same row. This DML operation can silently miss this
row if the row now gets deleted from the partition by the first
session as part of its UPDATE row movement. In such case, the
concurrent UPDATE/DELETE, being unaware of the row movement,
interprets that the row has just been deleted so there is nothing to
be done for this row. Whereas, in the usual case where the table is
not partitioned, or where there is no row movement, the second session
would have identified the newly updated row and carried UPDATE/DELETE
on this new row version."

---

Further, in the Notes section of update.sgml, I have kept a link to
the above limitations section like this :

"In the case of a partitioned table, updating a row might cause it to
no longer satisfy the partition constraint of the containing
partition. In that case, if there is some other partition in the
partition tree for which this row satisfies its partition constraint,
then the row is moved to that partition. If there isn't such a
partition, an error will occur. The error will also occur when
updating a partition directly. Behind the scenes, the row movement is
actually a DELETE and INSERT operation. However, there is a
possibility that a concurrent UPDATE or DELETE on the same row may
miss this row. For details see the section Section 5.10.2.3."

>
>>>> + If an <command>UPDATE</command> on a partitioned table causes a row to
>>>> + move to another partition, it is possible that all row-level
>>>> + <literal>BEFORE</> <command>UPDATE</command> triggers and all row-level
>>>> + <literal>BEFORE</> <command>DELETE</command>/<command>INSERT</command>
>>>> + triggers are applied on the respective partitions in a way that is
>>>> apparent
>>>> + from the final state of the updated row.
>>>>
>>>> How about dropping "it is possible that" from this sentence?
>>>
>>> What the statement means is : "It is true that all triggers are
>>> applied on the respective partitions; but it is possible that they are
>>> applied in a way that is apparent from final state of the updated
>>> row". So "possible" applies to "in a way that is apparent..". It
>>> means, the user should be aware that all the triggers can change the
>>> row and so the final row will be affected by all those triggers.
>>> Actually, we have a similar statement for UPSERT involved with
>>> triggers in the preceding section. I have taken the statement from
>>> there.
>
> I think where it appears in that sentence made me think it could be
> confusing to some. How about reordering sentences in that paragraph so
> that the whole paragraphs reads as follows:
>
> If an UPDATE on a partitioned table causes a row to move to another
> partition, it will be performed as a DELETE from the original partition
> followed by INSERT into the new partition. In this case, all row-level
> BEFORE UPDATE triggers and all row-level BEFORE DELETE triggers are fired
> on the original partition. Then all row-level BEFORE INSERT triggers are
> fired on the destination partition. The possibility of surprising outcomes
> should be considered when all these triggers affect the row being moved.
> As far as AFTER ROW triggers are concerned, AFTER DELETE and AFTER INSERT
> triggers are applied; but AFTER UPDATE triggers are not applied because
> the UPDATE has been converted to a DELETE and INSERT. None of the DELETE
> and INSERT statement-level triggers are fired, even if row movement
> occurs; only the UPDATE triggers of the target table used in the UPDATE
> statement will be fired.

Yeah, most of the above makes sense to me. I have kept the phrase "as
far as statement-level triggers are concerned".

>
> Finally, I forgot to mention during the last review that the new parameter
> 'returning' to ExecDelete() could be called 'process_returning'.

Done, thanks.

Attached updated patch v7 has the above changes.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
update-partition-key_v7.patch application/octet-stream 32.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-04-04 11:12:12 Re: Making clausesel.c Smarter
Previous Message Heikki Linnakangas 2017-04-04 10:52:41 Re: Implementation of SASLprep for SCRAM-SHA-256