Re: WAL logging problem in 9.4.3?

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2015-07-03 05:20:49
Message-ID: 20150703052048.GA20285@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 03, 2015 at 12:21:02AM +0200, Andres Freund wrote:
> Hi,
>
> On 2015-07-03 00:05:24 +0200, Martijn van Oosterhout wrote:
> > === Start with an empty database
>
> My guess is you have wal_level = minimal?

Default config, was just initdb'd. So yes, the default wal_level =
minimal.

> > === Note the index file is 8KB.
> > === At this point nuke the database server (in this case it was simply
> > === destroying the container it was running in.
>
> How did you continue from there? The container has persistent storage?
> Or are you repapplying the WAL to somewhere else?

The container has persistant storage on the host. What I think is
actually unusual is that the script that started postgres was missing
an 'exec" so postgres never gets the signal to shutdown.

> > martijn(at)martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/ 000000010000000000000001 |grep -wE '16389|16387|16393'
> > rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc: checkpoint: redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> > rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: file create: base/16385/16387
> > rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log: rel 1663/16385/16387
> > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: file create: base/16385/16389
> > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: file create: base/16385/16393
> > rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log: rel 1663/16385/16387
> > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: file truncate: base/16385/16389 to 0 blocks
> > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: file truncate: base/16385/16393 to 0 blocks
> > pg_xlogdump: FATAL: error in WAL record at 0/16BE710: record with zero length at 0/16BE740
>
> Note that the truncate will lead to a new, different, relfilenode.

Really? Comparing the relfilenodes gives the same values before and
after the truncate.
>
> > ctmp=# select * from test;
> > ERROR: could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes
>
> Hm. I can't reproduce this. Can you include a bit more details about how
> to reproduce?

Hmm, for me it is 100% reproducable. Are you familiar with Docker? I
can probably construct a Dockerfile that reproduces it pretty reliably.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-07-03 05:23:58 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Etsuro Fujita 2015-07-03 05:05:34 Re: Foreign join pushdown vs EvalPlanQual