Re: Concurrency bug in UPDATE of partition-key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Concurrency bug in UPDATE of partition-key
Date: 2018-06-25 09:36:58
Message-ID: CAJ3gD9cdsiVwtZxs6WimxRVfmM31P47LFfJN0nPT687gkhHV9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 June 2018 at 15:46, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Jun 22, 2018 at 10:25 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> On 20 June 2018 at 18:54, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>>
>>> 2.
>>> ExecBRDeleteTriggers(..)
>>> {
>>> ..
>>> + /* If requested, pass back the concurrently updated tuple, if any. */
>>> + if (epqslot != NULL)
>>> + *epqslot = newSlot;
>>> +
>>> if (trigtuple == NULL)
>>> return false;
>>> +
>>> + /*
>>> + * If the tuple was concurrently updated and the caller of this
>>> + * function requested for the updated tuple, skip the trigger
>>> + * execution.
>>> + */
>>> + if (newSlot != NULL && epqslot != NULL)
>>> + return false;
>>> ..
>>> }
>>>
>>> Can't we merge the first change into second, something like:
>>>
>>> if (newSlot != NULL && epqslot != NULL)
>>> {
>>> *epqslot = newSlot;
>>> return false;
>>> }
>>
>> We want to update *epqslot with whatever the value of newSlot is. So
>> if newSlot is NULL, epqslot should be NULL. So we can't do that in the
>> "if (newSlot != NULL && epqslot != NULL)" condition.
>>
>
> Why do you need to update when newslot is NULL? Already *epqslot is
> initialized as NULL in the caller (ExecUpdate). I think we only want
> to update it when trigtuple is not null.

But GetTupleForTrigger() updates the epqslot to NULL even when it
returns NULL. So to be consistent with it, we want to do the same
thing for ExecBRDeleteTriggers() : Update the epqslot even when
GetTupleForTrigger() returns NULL.

I think the reason we are doing "*newSlot = NULL" as the very first
thing in the GetTupleForTrigger() code, is so that callers don't have
to initialise newSlot to NULL before calling GetTupleForTrigger. And
so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
can follow the same approach while calling ExecDelete().

I understand that before calling ExecDelete() epqslot is initialized
to NULL, so it is again not required to do the same inside
GetTupleForTrigger(), but as per my above explanation, it is actually
not necessary to initialize to NULL before calling ExecDelete(). So I
can even remove that initialization.

> Apart from looking bit
> awkward, I think it is error-prone as well because the code of
> GetTupleForTrigger is such that it can return NULL with a valid value
> of newSlot in which case the behavior will become undefined. The case
> which I am worried is when first-time EvalPlanQual returns some valid
> value of epqslot, but in the next execution after heap_lock_tuple,
> returns NULL. In practise, it won't happen because EvalPlanQual locks
> the tuple, so after that heap_lock_tuple shouldn't fail again, but it
> seems prudent to not rely on that unless we need to.

Yes, I had given a thought on exactly this. I think the intention of
the code is to pass back NULL epqslot when there is no concurrently
updated tuple. I think the code in GetTupleForTrigger() is well aware
that EvalPlanQual() will never be called twice. The only reason it
goes back to ltrmark is to re-fetch the locked tuple. The comment also
says so :
/*
* EvalPlanQual already locked the tuple, but we
* re-call heap_lock_tuple anyway as an easy way of
* re-fetching the correct tuple. Speed is hardly a
* criterion in this path anyhow.
*/

So newSlot is supposed to be updated only once.

>
> For now, I have removed it in the attached patch, but if you have any
> valid reason, then we can add back.
>
>>>
>>> 4.
>>> +/* ----------
>>> + * ExecBRDeleteTriggers()
>>> + *
>>> + * Called to execute BEFORE ROW DELETE triggers.
>>> + *
>>> + * Returns false in following scenarios :
>>> + * 1. Trigger function returned NULL.
>>> + * 2. The tuple was concurrently deleted OR it was concurrently updated and the
>>> + * new tuple didn't pass EvalPlanQual() test.
>>> + * 3. The tuple was concurrently updated and the new tuple passed the
>>> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this
>>> + * function skips the trigger function execution, because the caller has
>>> + * indicated that it wants to further process the updated tuple. The updated
>>> + * tuple slot is passed back through epqslot.
>>> + *
>>> + * In all other cases, returns true.
>>> + * ----------
>>> + */
>>> bool
>>> ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
>>>
> ..
>>
>> If it looks complicated, can you please suggest something to make it a
>> bit simpler.
>>
>
> See attached.
>
> Apart from this, I have changed few comments and fixed indentation
> issues. Let me know what you think about attached?

Yes, the changes look good, except for the above one point on which we
haven't concluded yet.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message zafiirah jumeen 2018-06-25 10:00:07 Auto-partitioning in PostgreSQL 10
Previous Message Konstantin Knizhnik 2018-06-25 09:32:23 Re: libpq compression