Fundamental error in "no WAL log" index/file creation stuff

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Fundamental error in "no WAL log" index/file creation stuff
Date: 2005-06-25 19:27:51
Message-ID: 28884.1119727671@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I believe I have figured out the problem behind the recent reports we've
been seeing of "index is not a btree" errors. Here's how to reproduce
it (in 8.0 or HEAD):

-- drop database test if present
checkpoint;
create database test;
\c test
create table t1 ( name varchar(20) primary key );
-- kill -9 either current session or bgwriter, to force XLOG replay
-- now reconnect to test
vacuum t1;
ERROR: index "t1_pkey" is not a btree

On investigation, the filesystem shows t1_pkey exists but has size 0.

The reason for this is that the only entry in the XLOG concerning
t1_pkey is an "smgr create" record --- we didn't XLOG any of the
data inserted into the index, and particularly not the metapage.

Why is that a problem, if we fsynced the index? Because *replay of
CREATE DATABASE drops and recreates the entire database directory*.
This is not trivial to avoid, because the only way to generate the
new database is to copy from another database, and it's very hard
to tell what to copy if we want it done selectively.

It seems our choices are (a) somehow fix things so CREATE DATABASE
replay doesn't have to zap the whole directory, (b) force a checkpoint
immediately after any CREATE DATABASE, so that we never have to replay
one except in a PITR situation, or (c) abandon non-WAL-logged index
and table builds.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-06-25 19:52:08 Re: autovacuum bootstrap
Previous Message Alvaro Herrera 2005-06-25 18:23:52 autovacuum bootstrap