Skip site navigation (1) Skip section navigation (2)

WAL shortcoming causes missing-pg_clog-segment problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WAL shortcoming causes missing-pg_clog-segment problem
Date: 2002-09-26 20:27:43
Message-ID: 6400.1033072063@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
I think I've identified a primary cause for the "no such pg_clog file"
problem that we've seen reported several times.

A look at htup.h shows that the WAL only stores the low 8 bits of a
tuple's t_infomask (see xl_heap_header struct).  There is some fooling
around in heapam.c's WAL redo routines to try to reconstitute some of
the high-order bits, for example this:

            htup->t_infomask = HEAP_XMAX_INVALID | xlhdr.mask;

But this is implicitly assuming that we can reconstruct the
XMIN_COMMITTED bit at will.  That was true when the WAL code was
written, but with 7.2's ability to recycle allegedly-no-longer-needed
pg_clog data, we cannot simply drop commit status bits.

The only scenario I've been able to identify in which this actually
causes a failure is when VACUUM FULL moves an old tuple and then shortly
afterwards (before the next checkpoint) there is a crash.  Post-crash,
the tuple move will be redone from WAL, and the moved tuple will be
inserted with zeroed-out commit status bits.  When we next examine the
tuple, we have to try to retrieve its commit status from pg_clog ... but
it's not there anymore.

As far as I can see, the only realistic solution is to store the full 16
bits of t_infomask in the WAL.  We could do this without increasing WAL
size by dropping the t_hoff field from xl_heap_header --- t_hoff is
computable given the number of attributes and the HASNULL/HASOID bits,
both of which are available.  (Actually, we could save some space now
by getting rid of t_oid in xl_heap_header; it's not necessary given that
OID isn't in the fixed tuple header anymore.)

This will require a WAL format change of course.  Fortunately we can do
that without forcing a complete initdb (people will have to run
pg_resetxlog if they want to update a 7.3beta2 database without initdb).

I see no way to fix the problem in the context of 7.2.  Perhaps we
should put out a bulletin warning people to avoid VACUUM FULL in 7.2,
or at least to do CHECKPOINT as soon as possible after one.

			regards, tom lane

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2002-09-26 20:32:17
Subject: HOLD ON BETA2
Previous:From: Bruce MomjianDate: 2002-09-26 20:00:48
Subject: Re: [HACKERS] Performance while loading data and indexing

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group