Skip site navigation (1) Skip section navigation (2)

Re: WAL logging freezing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: WAL logging freezing
Date: 2006-10-30 17:05:19
Message-ID: 17552.1162227919@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Ugh.  Is there another solution to this?  Say, sync the buffer so that
> the hint bits are written to disk?

Yeah.  The original design for all this is explained by the notes for
TruncateCLOG:

 * When this is called, we know that the database logically contains no
 * reference to transaction IDs older than oldestXact.	However, we must
 * not truncate the CLOG until we have performed a checkpoint, to ensure
 * that no such references remain on disk either; else a crash just after
 * the truncation might leave us with a problem.

The pre-8.2 coding is actually perfectly safe within a single database,
because TruncateCLOG is only called at the end of a database-wide
vacuum, and so the checkpoint is guaranteed to have flushed valid hint
bits for all tuples to disk.  There is a risk in other databases though.
I think that in the 8.2 structure the equivalent notion must be that
VACUUM has to flush and fsync a table before it can advance the table's
relminxid.

That still leaves us with the problem of hint bits not being updated
during WAL replay.  I think the best solution for this is for WAL replay
to force relvacuumxid to equal relminxid (btw, these field names seem
poorly chosen, and the comment in catalogs.sgml isn't self-explanatory...)
rather than adopting the value shown in the WAL record.  This probably
is best done by abandoning the generic "overwrite tuple" WAL record type
in favor of something specific to minxid updates.  The effect would then
be that a PITR slave would not truncate its clog beyond the freeze
horizon until it had performed a vacuum of its own.

The point about aborted xmax being a risk factor is a good one.  I don't
think the risk is material for ordinary crash recovery scenarios,
because ordinarily we'd have many opportunities to set the hint bit
before anything really breaks, but it's definitely an issue for
long-term PITR replay scenarios.

I'll work on this as soon as I get done with the btree-index issue I'm
messing with now.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Chris BrowneDate: 2006-10-30 17:23:18
Subject: Re: [HACKERS] Replication documentation addition
Previous:From: Jim C. NasbyDate: 2006-10-30 16:40:43
Subject: Re: bug in on_error_rollback !?

pgsql-patches by date

Next:From: Simon RiggsDate: 2006-10-30 20:30:19
Subject: Re: [HACKERS] WAL logging freezing
Previous:From: Alvaro HerreraDate: 2006-10-30 16:20:56
Subject: Re: WAL logging freezing

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group