Re: clog_redo causing very long recovery time

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joseph Conway <mail(at)joeconway(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: clog_redo causing very long recovery time
Date: 2011-05-06 03:22:43
Message-ID: 1756.1304652163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joseph Conway <mail(at)joeconway(dot)com> writes:
> I'm working with a client that uses Postgres on what amounts to an
> appliance.

> The database is therefore subject to occasional torture such as, in this
> particular case, running out of disk space while performing a million
> plus queries (of mixed varieties, many using plpgsql with exception
> handling -- more on that later), and eventually being power-cycled. Upon
> restart, clog_redo was called approx 885000 times (CLOG_ZEROPAGE) during
> recovery, which took almost 2 hours on their hardware. I should note
> that this is on Postgres 8.3.x.

> After studying the source, I can only see one possible way that this
> could have occurred:

> In varsup.c:GetNewTracsactionId(), ExtendCLOG() needs to succeed on a
> freshly zeroed clog page, and ExtendSUBTRANS() has to fail. Both of
> these calls can lead to a page being pushed out of shared buffers and to
> disk, so given a lack of disk space, sufficient clog buffers, but lack
> of subtrans buffers, this could happen. At that point the transaction id
> does not get advanced, so clog zeros the same page, extendSUBTRANS()
> fails again, rinse and repeat.

> I believe in the case above, subtrans buffers were exhausted due to the
> extensive use of plpgsql with exception handling.

Hmm, interesting. I believe that it's not really a question of buffer
space or lack of it, but whether the OS will give us disk space when we
try to add a page to the current pg_subtrans file. In any case, the
point is that a failure between ExtendCLOG and incrementing nextXid
is bad news.

> The attached fix-clogredo diff is my proposal for a fix for this.

That seems pretty grotty :-(

I think a more elegant fix might be to just swap the order of the
ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId. The
reason that would help is that pg_subtrans isn't WAL-logged, so if
we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we
won't have written any XLOG entry, and thus repeated failures will not
result in repeated XLOG entries. I seem to recall having considered
exactly that point when the clog WAL support was first done, but the
scenario evidently wasn't considered when subtransactions were stuck
in :-(.

It would probably also help to put in a comment admonishing people
to not add stuff right there. I see the SSI guys have fallen into
the same trap.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-05-06 03:29:13 Re: clog_redo causing very long recovery time
Previous Message Tom Lane 2011-05-06 03:12:40 Why is RegisterPredicateLockingXid called while holding XidGenLock?