Re: WAL logging problem in 9.4.3?

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2015-07-03 16:39:42
Message-ID: CAHGQGwGzxAyKLUm+z2n43-ee6u976jLmQAz=pPBVzSueD3CAig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 3, 2015 at 11:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> The optimization of TRUNCATE opereation that we can use when
>> CREATE TABLE and TRUNCATE are executed in the same transaction block
>> seems to cause the problem. In this case, only index file truncation is
>> logged, and index creation in btbuild() is not logged because wal_level
>> is minimal. Then at the subsequent crash recovery, index file is truncated
>> to 0 byte... Very simple fix is to log an index creation in that case,
>> but not sure if that's ok to do..
>
>> In 9.2 or before, this problem doesn't occur because no such error is thrown
>> even if an index file size is zero. But in 9.3 or later, since the planner
>> tries to read a meta page of an index to get the height of the btree tree,
>> an empty index file causes such error. The planner was changed that way by
>> commit 31f38f28, and the problem seems to be an oversight of that commit.
>
> What? You want to blame the planner for failing because the index was
> left corrupt by broken WAL replay? A failure would occur anyway at
> execution.

Yep, right. I was not thinking that such index with file size 0 is corrupted
because the reported problem didn't happen before that commit was added.
But that's my fault. Such index can cause an error even in other code paths.

Okay, so probably we need to change WAL replay of TRUNCATE so that
the index file is truncated to one containing only meta page instead of
empty one. That is, the WAL replay of TRUNCATE would need to call
index_build() after smgrtruncate() maybe.

Then how should we implement that? Invent new WAL record type that
calls smgrtruncate() and index_build() during WAL replay? Or add the
special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay
call index_build() only if the flag is found? Any other good idea?
Anyway ISTM that we might need to add or modify WAL record.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-07-03 16:40:45 Re: pgbench - allow backslash-continuations in custom scripts
Previous Message Tom Lane 2015-07-03 16:30:52 Re: psql :: support for \ev viewname and \sv viewname