Forward zeroing of pg_clog

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Forward zeroing of pg_clog
Date: 2004-08-30 18:19:49
Message-ID: 8958.1093889989@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just spent some time chasing weird failures ("PANIC: cannot abort
transaction 201109, it was already committed" after some but not all
errors) which I eventually realized were because pg_clog contained
commit and abort flags for several thousand transactions ahead of where
the current XID counter is in my test database.

How did it get that way? Well, yesterday I was testing the XLOG mods to
support huge COMMIT records, so I ran a test script that would commit a
transaction with 20000 subcommitted subtransactions. And then I kill 9'd
the backend to force WAL replay of that large transaction.

WAL replay sets the XID counter as one more than the largest XID that it
sees evidence of in the replayed log. However, it's not looking inside
the COMMIT or ABORT records, and so in this case the largest XID it saw
was that of the parent transaction. The actual pre-crash XID counter
was of course 20000 more than that.

This particular issue is just a simple oversight in xact_redo, and it's
easily fixed: make sure nextXID gets advanced past all of the committed
or aborted subXIDs too.

But thinking about it, I realized that we have some other issues in the
same area. Because subxact commit sets clog bits but emits no WAL
record, it's at least theoretically possible that post-crash there will
be written-out clog bits for XIDs ahead of every XID of which there is
any record in the WAL data. RecordTransactionCommit and friends have
other cases in which they think it's sufficient to write a clog entry
and no WAL entry. Perhaps that's broken, but I think the cleanest fix
is that the clog code ought to forcibly zero all clog entries ahead of
whatever nextXID is settled on by WAL replay. Otherwise we run some
risk of subtransactions that are still running looking like they are
subcommitted (or worse) in the clog data.

This is already true at the page level: when advancing into a new page
we zero it instead of reading anything from disk. I am thinking of
adding code to StartupCLOG to zero the remaining portion of the
"current" page too.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2004-08-30 18:29:15 pgxs regression test support
Previous Message Jim Buttafuoco 2004-08-30 17:34:54 Re: beta 1 failed on linux mipsel