Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, amul sul <sulamul(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Date: 2018-03-08 17:25:36
Message-ID: CABOikdOD-ejBeT0rhEBXZ+nJm2wMBf6_xfr2U2+b3the=TKUcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 8, 2018 at 10:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> However, there's no such thing as a free lunch. We can't use the CTID
> field to point to a CTID in another table because there's no room to
> include the identify of the other table in the field. We can't widen
> it to make room because that would break on-disk compatibility and
> bloat our already-too-big tuple headers. So, we cannot make it work
> like it does when the updates are confined to a single partition.
> Therefore, the only options are (1) ignore the problem, and let a
> cross-partition update look entirely like a delete+insert, (2) try to
> throw some error in the case where this introduces user-visible
> anomalies that wouldn't be visible otherwise, or (3) revert update
> tuple routing entirely. I voted for (1), but the consensus was (2).
> I think that (3) will make a lot of people sad; it's a very good
> feature.

I am definitely not suggesting to do #3, though I agree with Tom that the
option is on table. May be two back-to-back bugs in the area makes me
worried and raises questions about the amount of testing the feature has
got. In addition, making such a significant on-disk change for one corner
case, for which even #1 might be acceptable, seems a lot. If we at all want
to go in that direction, I would suggest considering a patch that I wrote
last year to free-up additional bits from the ctid field (as part of the
WARM). I know Tom did not like that either, but at the very least, it
provides us a lot more room for future work, with the same amount of risk.

> If we want to have (2), then we've got to have some way to
> mark a tuple that was deleted as part of a cross-partition update, and
> that requires a change to the on-disk format.
>

I think the question is: isn't there an alternate way to achieve the same
result? One alternate way would be to do what I suggested above i.e. free
up more bits and use one of those. Another way would be to add a hidden
column to the partition table, when it is created or when it is attached as
a partition. This only penalises the partition tables, but keeps rest of
the system out of it. Obviously, if this column is added when the table is
attached as a partition, as against at table creation time, then the old
tuple may not have room to store this additional field. May be we can
handle that by double updating the tuple? That seems bad, but then it only
impacts the case when a partition key is updated. And we can clearly
document performance implications of that operation. I am not sure how
common this case is going to be anyways. With this hidden column, we can
even store a pointer to another partition and do something with that, if at
all needed.

That's just one idea. Of course, I haven't thought about it for more than
10mins, so most likely I may have missed out on details and it's probably a
stupid idea afterall. But there could be other ideas too. And even if we
can't find one, my vote would be to settle for #1 instead of trying to do
#2.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-03-08 17:29:21 Re: Temporary tables prevent autovacuum, leading to XID wraparound
Previous Message Robert Haas 2018-03-08 17:14:53 Re: Testbed for predtest.c ... and some arguable bugs therein