VACUUM FULL versus unsafe order-of-operations in DDL commands

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: VACUUM FULL versus unsafe order-of-operations in DDL commands
Date: 2011-08-14 18:21:56
Message-ID: 29144.1313346116@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So, as the testing rolls on, I started to see some failures in various
ALTER-FOREIGN-thingy commands. The cause proved to be that numerous
places in foreigncmds.c do this:

tuple = SearchSysCacheCopy(...);

... alter the tuple as needed ...

rel = heap_open(target-catalog, RowExclusiveLock);

simple_heap_update(rel, &tuple->t_self, tuple);

heap_close(rel, RowExclusiveLock);

rather than the more common pattern in which the catalog is opened
first. I confess to not having realized this myself (or if I ever did
know it, I'd forgotten), but *the above coding pattern is not safe*.
You must get your lock on the catalog *before* looking up the target
tuple, else its TID may be obsoleted by a concurrent vacuum full before
you've obtained lock on the catalog. Both update and delete operations
are at risk in this way.

foreigncmds.c is not hard to fix, but the scary aspect of this is the
possibility that we've made the same mistake elsewhere, or might do so
again in future. Some desultory examination of simple_heap_update and
simple_heap_delete calls didn't find any other instances, but I am not
sure I didn't miss anything. And this seems like an easy trap to fall
into when refactoring (the current work to try to unify operations like
ALTER OWNER could easily get into this kind of problem, for instance).

I tried to think of some practical way to mechanically test for this
type of error, but came up with nothing. Any ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-08-14 18:33:34 Re: our buffer replacement strategy is kind of lame
Previous Message Tom Lane 2011-08-14 17:11:01 Re: our buffer replacement strategy is kind of lame