I discovered $subject while continuing to test my patch. The problem
occurs because many parts of the backend are in the habit of releasing
lock on system catalogs as soon as they've updated whatever they wanted
to update. This means that we can encounter INSERT_IN_PROGRESS or
DELETE_IN_PROGRESS tuples, even though we have exclusive lock on the
table. That's no big problem as far as copy_heap_data goes; it can
just transfer the rows anyway. However, when we get to the point
of reindexing the catalog, IndexBuildHeapScan will wait for commit of
whichever transaction did the deed. Since we're still holding exclusive
lock on the catalog, it's entirely likely for that transaction to need
to read or write the catalog again, whereupon deadlock.
There are a couple of things we could try to do about this, other than
just live with it:
1. Try to avoid releasing locks early in system catalog operations.
The trouble with this is that it would inevitably create deadlock
failures of its own, due to adding lock-upgrade situations where there
were none before. That is, instead of patterns like
Acquire(RowExclusiveLock), then Release(RowExclusiveLock), we'd have
Acquire(AccessShareLock), Acquire(RowExclusiveLock), then release; which
could deadlock against a parallel acquisition of higher-level locks.
Maybe we could make it work reasonably well if we release
AccessShareLock quickly after a read but always hold RowExclusiveLock
once acquired. Not sure.
2. In VAC FULL and CLUSTER, tell index rebuild not to wait around for
INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples. I believe that the
price of this would be not re-verifying the integrity of unique indexes.
Which is kind of annoying, but then again rechecking data consistency
is not the purpose of these commands. It would not be too unreasonable
to tell people to use REINDEX if they suspect the validity of an index
--- that takes a significantly weaker lock and wouldn't be nearly as
big a problem to use on system catalogs.
3. Try to ameliorate the problem by making sure it's the VACUUM FULL
that fails, and not the other process, when there's a deadlock.
BTW, the reason we don't see this type of problem with the existing
VAC FULL INPLACE logic is that it essentially silently adopts #2
--- it suppresses uniqueness checks even when it's making new index
At the moment #2 is seeming like the most attractive alternative;
both of the others look like research projects. Comments?
regards, tom lane
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2010-02-06 00:55:06|
|Subject: Re: WIP patch for system-catalog vacuuming via a relation map |
|Previous:||From: Josh Berkus||Date: 2010-02-06 00:42:35|
|Subject: Re: Confusion over Python drivers|