Skip site navigation (1) Skip section navigation (2)

Re: WAL logging freezing

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>,<pgsql-patches(at)postgresql(dot)org>
Subject: Re: WAL logging freezing
Date: 2006-10-30 14:08:37
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-patches
We just discussed this in detail with Simon, and it looks like we have 
5 (!) different but related problems:

1) The original problem of freeze then crash, leaving too high values in 
relminxid and datminxid. If you then run vacuum, it might truncate CLOG 
and you lose the commit status of the records that were supposed to be 

To fix this, we need to WAL log freezing as already discussed.

2) vactuple_get_minxid doesn't take into account xmax's of tuples that 
have HEAP_XMAX_INVALID set. That's a problem:

transaction 1001 - BEGIN; DELETE FROM foo where key = 1;
transaction 1001 - ROLLBACK;
transaction 1002 - VACUUM foo;

VACUUM foo will set relminxid to 1002, because HEAP_XMAX_INVALID was set 
on the tuple (possibly by vacuum itself) that the deletion that rolled 
back touched. However, that hint-bit update hasn't hit the disk yet, so 
after recovery, the tuple will have an xmax of 1001 with no hint-bit, 
and relminxid is 1002.

The simplest fix for this issue is to ignore the HEAP_XMAX_INVALID hint 
bit, and take any xmax other than InvalidXid into account when 
calculating the relminxid.

3) If you recover from a PITR backup (or have a hot stand-by), with base 
backup that's more than 4 billion transactions older than the newest WAL 
record, the clog entries of old transactions in the base backup will 
overlap with the clog entries of new transactions that are in the WAL 
records. This is the problem you also pointed out below.

To fix this, we need to emit a WAL record when truncating the clog. We 
must also make sure that recovery of any WAL record type doesn't rely on 
clog, because if we truncate the clog and then crash, recovery won't 
have the clog available for the old transactions. At the moment, 
TruncateCLog issues a checkpoint to protect from that but that's not 
going to work when rolling forward logs in PITR, right?

4) If we fix issue 2 so that vactuple_get_minxid always takes xmax into 
account, even if HEAP_XMAX_INVALID is set, a tuple with an aborted xmax 
will keep us from advancing relminxid and truncating clog etc. That 
doesn't lead to data corruption, but you will eventually hit the 
transaction wrap-around limit. We don't have the same problem with xmin, 
because we freeze tuples that are older than FreezeLimit to avoid it, 
but we don't do that for xmax.

To fix this, replace any xmax older than FreezeLimit with InvalidXid 
during vacuum. That also needs to be WAL logged.

5) We don't freeze tuples that are in RECENTLY_DEAD or 
DELETE_IN_PROGRESS state. That doesn't lead to data corruption, but it 
might make you hit the transaction wrap-around limit. That can happen if 
you have a transaction that deletes or updates a very old, but not yet 
frozen tuple. If you run vacuum while the deleting transaction is in 
progress, vacuum won't freeze the tuple, and won't advance the 
wrap-around limit because of the old tuple. That's not serious if the 
deleting transaction commits, because the next vacuum will then remove 
the tuple, but if it aborts, we might run into the same problem on the 
next vacuum, and the next one, and the next one, until we reach the 

To fix this, simply do the freezing for tuples in RECENTLY_DEAD and 
DELETE_IN_PROGRESS states as well,

Am I missing something? Finding this many bugs makes me nervous...

Simon volunteered to make the clog changes for 3 because it's a PITR 
related issue. I can write a patch/patches for the other changes if it 

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> I think it's premature to start writing
>>> patches until we've decided how this really needs to work.
>> Not logging hint-bit updates seems safe to me. As long as we have the 
>> clog, the hint-bit is just a hint. The problem with freezing is that 
>> after freezing tuples, the corresponding clog page can go away.
> Actually clog can go away much sooner than that, at least in normal
> operation --- that's what datvacuumxid is for, to track where we can
> truncate clog.  Maybe it's OK to say that during WAL replay we keep it
> all the way back to the freeze horizon, but I'm not sure how we keep the
> system from wiping clog it still needs right after switching to normal
> operation.  Maybe we should somehow not xlog updates of datvacuumxid?
> Another thing I'm concerned about is the scenario where a PITR
> hot-standby machine tracks a master over a period of more than 4 billion
> transactions.  I'm not sure what will happen in the slave's pg_clog
> directory, but I'm afraid it won't be good :-(
> 			regards, tom lane

   Heikki Linnakangas

In response to


pgsql-hackers by date

Next:From: Teodor SigaevDate: 2006-10-30 15:34:12
Subject: Re: NOTICE: word is too long INSERT 0 3014
Previous:From: Heikki LinnakangasDate: 2006-10-30 13:42:27
Subject: Re: WAL logging freezing

pgsql-patches by date

Next:From: Alvaro HerreraDate: 2006-10-30 16:20:56
Subject: Re: WAL logging freezing
Previous:From: Heikki LinnakangasDate: 2006-10-30 13:42:27
Subject: Re: WAL logging freezing

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group