Re: UPDATE of partition key

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-06-01 19:47:25
Message-ID: CA+Tgmoa8h_Oxw3A6vP-4G7YiWLByxqsjtK=7D+fB9qKKN9-0iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 1, 2017 at 7:41 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> Regarding the trigger issue, I can't claim to have a terribly strong
>> opinion on this. I think that practically anything we do here might
>> upset somebody, but probably any halfway-reasonable thing we choose to
>> do will be OK for most people. However, there seems to be a
>> discrepancy between the approach that got the most votes and the one
>> that is implemented by the v8 patch, so that seems like something to
>> fix.
>
> Yes, I have started working on updating the patch to use that approach
> (BR and AR update triggers on source and destination partition
> respectively, instead of delete+insert) The approach taken by the
> patch (BR update + delete+insert triggers) didn't require any changes
> in the way ExecDelete() and ExecInsert() were called. Now we would
> require to skip the delete/insert triggers, so some flags need to be
> passed to these functions, or else have stripped down versions of
> ExecDelete() and ExecInsert() which don't do other things like
> RETURNING handling and firing triggers.

See, that strikes me as a pretty good argument for firing the
DELETE+INSERT triggers...

I'm not wedded to that approach, but "what makes the code simplest?"
is not a bad tiebreak, other things being equal.

>> In terms of the approach taken by the patch itself, it seems
>> surprising to me that the patch only calls
>> ExecSetupPartitionTupleRouting when an update fails the partition
>> constraint. Note that in the insert case, we call that function at
>> the start of execution;
>
>> calling it in the middle seems to involve additional hazards;
>> for example, is it really safe to add additional
>> ResultRelInfos midway through the operation?
>
> I thought since the additional ResultRelInfos go into
> mtstate->mt_partitions which is independent of
> estate->es_result_relations, that should be safe.

I don't know. That sounds scary to me, but it might be OK. Probably
needs more study.

>> Is it safe to take more locks midway through the operation?
>
> I can imagine some rows already updated, when other tasks like ALTER
> TABLE or CREATE INDEX happen on other partitions which are still
> unlocked, and then for row movement we try to lock these other
> partitions and wait for the DDL tasks to complete. But I didn't see
> any particular issues with that. But correct me if you suspect a
> possible issue. One issue can be if we were able to modify the table
> attributes, but I believe we cannot do that for inherited columns.

It's just that it's very unlike what we do anywhere else. I don't
have a real specific idea in mind about what might totally break, but
at a minimum it could certainly cause behavior that can't happen
today. Today, if you run a query on some tables, it will block
waiting for any locks at the beginning of the query, and the query
won't begin executing until it has all of the locks. With this
approach, you might block midway through; you might even deadlock
midway through. Maybe that's not overtly broken, but it's at least
got the possibility of being surprising.

Now, I'd actually kind of like to have behavior like this for other
cases, too. If we're inserting one row, can't we just lock the one
partition into which it needs to get inserted, rather than all of
them? But I'm wary of introducing such behavior incidentally in a
patch whose main goal is to allow UPDATE row movement. Figuring out
what could go wrong and fixing it seems like a substantial project all
of its own.

> The reason I thought it cannot be done at the start of the execution,
> is because even if we know that update is not modifying the partition
> key column, we are not certain that the final NEW row has its
> partition key column unchanged, because of triggers. I understand it
> might be weird for a user requiring to modify a partition key value,
> but if a user does that, it will result in crash because we won't have
> the partition routing setup, thinking that there is no partition key
> column in the UPDATE.

I think we could avoid that issue. Suppose we select the target
partition based only on the original NEW tuple. If a trigger on that
partition subsequently modifies the tuple so that it no longer
satisfies the partition constraint for that partition, just let it
ERROR out normally. Actually, it seems like that's probably the
*easiest* behavior to implement. Otherwise, you might fire triggers,
discover that you need to re-route the tuple, and then ... fire
triggers again on the new partition, which might reroute it again?

>> I'm not sure that it's sensible to allow a trigger on an
>> individual partition to reroute an update to another partition
>> what if we get an infinite loop?)
>
> You mean, if the other table has another trigger that will again route
> to the original partition ? But this infinite loop problem could occur
> even for 2 normal tables ?

How? For a normal trigger, nothing it does can change which table is targeted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-06-01 19:51:12 BEFORE trigger can cause undetected partition constraint violation
Previous Message Jeevan Ladhe 2017-06-01 19:35:03 Re: Adding support for Default partition in partitioning