Re: ALTER TABLE ALTER COLUMN SET TYPE crash

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robins Tharakan <tharakan(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: ALTER TABLE ALTER COLUMN SET TYPE crash
Date: 2020-08-26 20:25:08
Message-ID: 2944490.1598473508@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> The core issue here is that the table's catalog entries, as visible within
> this transaction, don't match what's in its disk file. ALTER TABLE knows
> that and is careful not to touch the old table using the new rowtype ---
> but nothing else knows that. So I think we need to block other code
> from touching the table-under-modification. As you say, there's not
> any existing infrastructure that will serve for that directly. We might
> be able to invent something comparable to the existing relcache entry
> refcount, but counting exclusive opens ("there can be only one").

I initially tried to fix this with a hard restriction that you can't open
a relation that's marked as being exclusively accessed, as in the 0001
patch below. That crashed and burned most spectacularly. To understand
the second attempt, it's helpful to enumerate some of the problems:

1. We have felt free to allow ALTER TABLE and similar commands to call
subroutines that re-open the target relation. Refactoring to the point
where that wouldn't happen (by means of always passing down an open
Relation) seems entirely impractical.

2. Keeping the exclusive-access state in relcache entries, as I first
tried to do, simply doesn't work, because ALTER TABLE doesn't
hold the relcache entry open throughout --- for instance, the initial
pin is released just after Phase 1 in ATController. I did try not
doing that, but it breaks other things. Besides I'm pretty sure that
that's designed that way intentionally, to reduce the number of forced
rebuilds of the relcache entry.

3. We can't have CheckTableNotInUse enforcing this either, because trying
to do so fails in numerous situation. ALTER TABLE itself is recursive to
some extent, and it turns out that most of the other callers neither need
nor want any such restriction. For instance there's actually a regression
test that fails if TRUNCATE tries to lock out accesses by called code.

In view of point 1, I have set this up so that we only check for
disallowed re-entrancy in the parse/rewrite/plan/execute code path.
(The checks added there are in the places where we first acquire lock
on any given relation.) Perhaps there are other places where we need
to check, but this at least plugs the primary hole.

In view of point 2, there's a new separate hashtable that holds the
exclusive-marking state. The extra cycles to probe it are slightly
annoying, but in the big scheme of things I doubt it's measurable.
(In any case, in a typical session that has never done ALTER TABLE,
no lookups are required.)

As for point 3, right now only ALTER TABLE is applying such marking
at all, and it's careful to release it as soon as consistency has
been restored. I looked through the other CheckTableNotInUse
callers and tentatively concluded that none of the other ones have
a problem; but more eyeballs on that would be a good thing.

Anyway, 0001 is just for amusement's sake; 0002 is the proposed patch.

regards, tom lane

Attachment Content-Type Size
0001-exclusive-access-FAIL.patch text/x-diff 3.0 KB
0002-exclusive-access-checks.patch text/x-diff 14.6 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-26 20:44:51 Re: ALTER TABLE ALTER COLUMN SET TYPE crash
Previous Message PG Bug reporting form 2020-08-26 20:11:16 BUG #16594: DROP INDEX CONCURRENTLY fails on partitioned table with a non helpful error message.