Re: race condition in pg_class

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Smolkin Grigory <smallkeen(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: race condition in pg_class
Date: 2023-10-27 23:26:12
Message-ID: 20231027232612.99.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 27, 2023 at 06:40:55PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:
> >> I'm inclined to propose that heap_inplace_update should check to
> >> make sure that it's operating on the latest version of the tuple
> >> (including, I guess, waiting for an uncommitted update?) and throw
> >> error if not. I think this is what your B3 option is, but maybe
> >> I misinterpreted. It might be better to throw error immediately
> >> instead of waiting to see if the other updater commits.
>
> > That's perhaps closer to B2. To be pedantic, B3 was about not failing or
> > waiting for GRANT to commit but instead inplace-updating every member of the
> > update chain. For B2, I was thinking we don't need to error. There are two
> > problematic orders of events. The easy one is heap_inplace_update() mutating
> > a tuple that already has an xmax. That's the one in the test case upthread,
> > and detecting it is trivial. The harder one is heap_inplace_update() mutating
> > a tuple after GRANT fetches the old tuple, before GRANT enters heap_update().
>
> Ugh ... you're right, what I was imagining would not catch that last case.
>
> > I anticipate a new locktag per catalog that can receive inplace updates,
> > i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION.
>
> We could perhaps make this work by using the existing tuple-lock
> infrastructure, rather than inventing new locktags (a choice that
> spills to a lot of places including clients that examine pg_locks).

That could be okay. It would be weird to reuse a short-term lock like that
one as something held till end of transaction. But the alternative of new
locktags ain't perfect, as you say.

> I would prefer though to find a solution that only depends on making
> heap_inplace_update protect itself, without high-level cooperation
> from the possibly-interfering updater. This is basically because
> I'm still afraid that we're defining the problem too narrowly.
> For one thing, I have nearly zero confidence that GRANT et al are
> the only problematic source of conflicting transactional updates.

Likewise here, but I have fair confidence that an assertion would flush out
the rest. heap_inplace_update() would assert that the backend holds one of
the acceptable locks. It could even be an elog; heap_inplace_update() can
tolerate that cost.

> For another, I'm worried that some extension may be using
> heap_inplace_update against a catalog we're not considering here.

A pgxn search finds "citus" using heap_inplace_update().

> I'd also like to find a solution that fixes the case of a conflicting
> manual UPDATE (although certainly that's a stretch goal we may not be
> able to reach).

It would be nice.

> I wonder if there's a way for heap_inplace_update to mark the tuple
> header as just-updated in a way that regular heap_update could
> recognize. (For standard catalog updates, we'd then end up erroring
> in simple_heap_update, which I think is fine.) We can't update xmin,
> because the VACUUM callers don't have an XID; but maybe there's some
> other way? I'm speculating about putting a funny value into xmax,
> or something like that, and having heap_update check that what it
> sees in xmax matches what was in the tuple the update started with.

Hmmm. Achieving it without an XID would be the real trick. (With an XID, we
could use xl_heap_lock like heap_update() does.) Thinking out loud, what if
heap_inplace_update() sets HEAP_XMAX_INVALID and xmax =
TransactionIdAdvance(xmax)? Or change t_ctid in a similar way. Then regular
heap_update() could complain if the field changed vs. last seen value. This
feels like something to regret later in terms of limiting our ability to
harness those fields for more-valuable ends or compact them away in a future
page format. I can't pinpoint a specific loss, so the idea might have legs.
Nontransactional data in separate tables or in new metapages smells like the
right long-term state. A project wanting to reuse the tuple header bits could
introduce such storage to unblock its own bit reuse.

> Or we could try to get rid of in-place updates, but that seems like
> a mighty big lift. All of the existing callers, except maybe
> the johnny-come-lately dropdb usage, have solid documented reasons
> to do it that way.

Yes, removing that smells problematic.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-10-28 01:23:29 Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml
Previous Message Jeff Davis 2023-10-27 23:08:37 Re: unnest multirange, returned order