Re: WAL logging freezing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: WAL logging freezing
Date: 2006-10-30 17:05:19
Message-ID: 17552.1162227919@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Ugh. Is there another solution to this? Say, sync the buffer so that
> the hint bits are written to disk?

Yeah. The original design for all this is explained by the notes for
TruncateCLOG:

* When this is called, we know that the database logically contains no
* reference to transaction IDs older than oldestXact. However, we must
* not truncate the CLOG until we have performed a checkpoint, to ensure
* that no such references remain on disk either; else a crash just after
* the truncation might leave us with a problem.

The pre-8.2 coding is actually perfectly safe within a single database,
because TruncateCLOG is only called at the end of a database-wide
vacuum, and so the checkpoint is guaranteed to have flushed valid hint
bits for all tuples to disk. There is a risk in other databases though.
I think that in the 8.2 structure the equivalent notion must be that
VACUUM has to flush and fsync a table before it can advance the table's
relminxid.

That still leaves us with the problem of hint bits not being updated
during WAL replay. I think the best solution for this is for WAL replay
to force relvacuumxid to equal relminxid (btw, these field names seem
poorly chosen, and the comment in catalogs.sgml isn't self-explanatory...)
rather than adopting the value shown in the WAL record. This probably
is best done by abandoning the generic "overwrite tuple" WAL record type
in favor of something specific to minxid updates. The effect would then
be that a PITR slave would not truncate its clog beyond the freeze
horizon until it had performed a vacuum of its own.

The point about aborted xmax being a risk factor is a good one. I don't
think the risk is material for ordinary crash recovery scenarios,
because ordinarily we'd have many opportunities to set the hint bit
before anything really breaks, but it's definitely an issue for
long-term PITR replay scenarios.

I'll work on this as soon as I get done with the btree-index issue I'm
messing with now.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2006-10-30 17:23:18 Re: [HACKERS] Replication documentation addition
Previous Message Jim C. Nasby 2006-10-30 16:40:43 Re: bug in on_error_rollback !?

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-10-30 20:30:19 Re: [HACKERS] WAL logging freezing
Previous Message Alvaro Herrera 2006-10-30 16:20:56 Re: WAL logging freezing