Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Date: 2018-03-08 16:57:05
Message-ID: CA+Tgmob5gn1oKCjBUu8t7Go4h_AZfN5yy22N_Jv0OK8advJoWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 8, 2018 at 10:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
>> I am actually very surprised that 0001-Invalidate-ip_blkid-v5.patch does
>> not do anything to deal with the fact that t_ctid may no longer point to
>> itself to mark end of the chain. I just can't see how that would work.
>> ...
>> I am actually worried that we're tinkering with ip_blkid to handle one
>> corner case of detecting partition key update. This is going to change
>> on-disk format and probably need more careful attention.
>
> You know, either one of those alone would be scary as hell. Both in
> one patch seem to me to be sufficient reason to reject it outright.
> Not only will it be an unending source of bugs, but it's chewing up
> far too much of what few remaining degrees-of-freedom we have in the
> on-disk format ... for a single purpose that hasn't even been sold as
> something we have to have.

I agree that it isn't clear that it's worth making a change to the
on-disk format for this feature. I made the argument when it was
first proposed that we should just document that there would be
anomalies with cross-partition updates that didn't occur otherwise.
However, multiple people thought that it was worth burning one of our
precious few remaining infomask bits in order to throw an error in
that case rather than just silently having an anomaly, and that's why
this patch got written. It's not too late to decide that we'd rather
not do that after all.

However, there's no such thing as a free lunch. We can't use the CTID
field to point to a CTID in another table because there's no room to
include the identify of the other table in the field. We can't widen
it to make room because that would break on-disk compatibility and
bloat our already-too-big tuple headers. So, we cannot make it work
like it does when the updates are confined to a single partition.
Therefore, the only options are (1) ignore the problem, and let a
cross-partition update look entirely like a delete+insert, (2) try to
throw some error in the case where this introduces user-visible
anomalies that wouldn't be visible otherwise, or (3) revert update
tuple routing entirely. I voted for (1), but the consensus was (2).
I think that (3) will make a lot of people sad; it's a very good
feature. If we want to have (2), then we've got to have some way to
mark a tuple that was deleted as part of a cross-partition update, and
that requires a change to the on-disk format.

In short, the two things that you are claiming are prohibitively scary
if done in the same patch look to me like they're actually just one
thing, and that one thing is something which absolutely has to be done
in order to implement the design most community members favored in the
original discussion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-03-08 16:57:49 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Tom Lane 2018-03-08 16:45:06 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)