From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Nasty bug in heap_page_prune |
Date: | 2008-03-06 01:23:46 |
Message-ID: | 9319.1204766626@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While working on the previously discussed refactoring of
heap_page_prune, I came to the realization that its use of
CacheInvalidateHeapTuple is not just a PANIC risk but simply wrong :-(
The semantics aren't right: inval.c assumes that it's dealing with
transactional invalidations, but what we are dealing with in a redirect
collapse is non-transactional. Once heap_page_prune completes, the
CTID update is a done deal even if the calling transaction rolls back.
However, inval.c will think it doesn't need to broadcast any
invalidations after a failed transaction.
This is fairly difficult to show a self-contained example of, because
it can only occur if VACUUM FULL errors out after doing a page prune,
and there's no very easy way to guarantee that will happen. I resorted
to putting an elog(ERROR) call into vacuum.c right after the scan_heap()
call. With that, it was possible to demonstrate the problem:
regression=# create table foo(f1 int);
CREATE TABLE
regression=# select ctid,relname from pg_class where relname = 'foo';
ctid | relname
--------+---------
(9,32) | foo
(1 row)
-- need a HOT-candidate update to the pg_class row, eg
regression=# alter table foo owner to joe;
ALTER TABLE
-- check that update is on same page, else it's not HOT
regression=# select ctid,relname from pg_class where relname = 'foo';
ctid | relname
--------+---------
(9,33) | foo
(1 row)
-- make sure the updated tuple is in local syscache
regression=# select 'foo'::regclass;
regclass
----------
foo
(1 row)
-- now, in another backend, execute intentionally broken VACUUM FULL pg_class
-- and try to alter the updated tuple again using a syscache-based operation
regression=# alter table foo owner to postgres;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The crash is here:
TRAP: FailedAssertion("!(((lp)->lp_flags == 1))", File: "heapam.c", Line: 2330)
LOG: server process (PID 8967) was terminated by signal 6
LOG: terminating any other active server processes
because it's trying to find the tuple at a CTID that's no longer valid.
Not sure about a clean solution to this. I don't really want to
bastardize inval.c by making it deal with nontransactional semantics,
but there may be no other way.
Or we could forget about letting VACUUM FULL collapse out LP_REDIRECT
pointers, at least in system catalogs. That might be the best answer
for 8.3 in any case; I am not seeing a real fix that I'd care to risk
back-patching. (Note that this is not exactly trivial in itself, since
then vacuum.c would need at least some minimal ability to deal with
LP_REDIRECT entries.)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2008-03-06 03:23:28 | Re: Problem with site doc search |
Previous Message | Andrew Chernow | 2008-03-06 00:12:57 | Re: libpq type system 0.9a |