Re: Why xlog stuff is done after the filetruncate op in smgrtruncate?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jacky Leng" <lengjianquan(at)163(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why xlog stuff is done after the filetruncate op in smgrtruncate?
Date: 2007-04-16 19:34:17
Message-ID: 9899.1176752057@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jacky Leng" <lengjianquan(at)163(dot)com> writes:
> Shouldn't we write xlog record before we do a physical operation?

The reasoning for not doing it that way was that we can't be sure
beforehand that the filesystem operation will succeed. If we xlog
the truncate first, it fails, and then we crash, we're in deep trouble
because WAL replay will try to do the truncate and likewise fail,
preventing the system from restarting. Other non-rollbackable
filesystem ops (I think just CREATE/DROP DATABASE/TABLESPACE) are done
the same way. CREATE DATABASE would be particularly nasty to reverse
the order for, since there are obvious cases like out-of-disk-space
that will make it fail.

> An test case:
> 1. set full_page_writes off;
> 2. startup database; create a table; insert 100000 rows in it; shutdown
> database;
> 3. startup database again; delete all rows from this table;
> 4. vacuum this table, and it will come into smgrtruncate; kill postmaster
> before smgrtruncate do xlog stuff(set a breakpoint before xlog stuff);
> 5. startup database the 3rd time, during the recovery, the database will
> crash with:
> PANIC: WAL contains references to invalid pages

Hmm. Maybe we need something like xlog a "tentative truncate", do it,
xlog "real truncate"? The tentative truncate would merely tell replay
not to be surprised if those blocks aren't there anymore. Seems a bit
grotty though.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-04-16 20:16:42 Re: patch to suppress psql timing output in quiet mode
Previous Message Zdenek Kotala 2007-04-16 19:09:17 Re: Adjusting index special storage for pg_filedump's convenience