Re: BUG #15938: Corrupted WAL segment after crash recovery

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: cyberdemn(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15938: Corrupted WAL segment after crash recovery
Date: 2019-08-06 08:11:40
Message-ID: 20190806.171140.265475235.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello.

At Thu, 01 Aug 2019 13:52:52 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in <15938-8591df7e95064538(at)postgresql(dot)org>
> The following bug has been logged on the website:
>
> Bug reference: 15938
> Logged by: Alexander Kukushkin
> Email address: cyberdemn(at)gmail(dot)com
> PostgreSQL version: 10.9
> Operating system: Ubuntu 18.04.2 LTS
> Description:
>
> On one of our cluster one of the postgres backend processes was killed by
> kernel oom.

Although I don't think replication reconnection to
crash-recovered master is generally guaranteed, but this seems
different.

> 2019-08-01 12:41:06.376 UTC,,,22697,,5d42dddb.58a9,4,,2019-08-01 12:40:59
> UTC,,0,LOG,00000,"redo done at B8D/44FFE570",,,,,,,,,""
> 2019-08-01 12:41:06.376 UTC,,,22697,,5d42dddb.58a9,5,,2019-08-01 12:40:59
> UTC,,0,LOG,00000,"last completed transaction was at log time 2019-08-01
> 12:40:54.782747+00",,,,,,,,,""
..
> Unfortunately I can't compare files in the archive and on the primary,
> because it was recycled despite usage of replication slots.
> pg_replication_slots view reports restart_lsn as B8D/451EB540 (the next
> segment).

WAL records since 44ffe6d0 (maybe) till 451eb540 are somehow
ignored during crash recovery of the master, or lost despite it
should have been fsynced out. The only thing I came up for this
is the fsync problem but it is fixed in 10.7. But the following
wiki description:

https://wiki.postgresql.org/wiki/Fsync_Errors

> Linux 4.13 and 4.15: fsync() only reports writeback errors that
> occurred after you called open() so our schemes for closing and
> opening files LRU-style and handing fsync() work off to the
> checkpointer process can hide write-back errors; also buffers are
> marked clean after errors so even if you opened the file before
> the failure, retrying fsync() can falsely report success and the
> modified buffer can be thrown away at any time due to memory
> pressure.

If I read this correctly, after checkpointer failed to fsync
while creating of a preallocate file (this is an ERROR, not a
PANIC), other processes will never receive fsync error about the
file. This scenario is consistent with the fact that it seems
that the data loss starts from new segment's beginning (assuming
that the original 44ffe6d0 continues to the next segment).

Thoughts?

> But I can run pg_waldump, and the last record in this file is
> CHECKPOINT_SHUTDOWN:
> rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn:
> B8D/44FFF6D0, prev B8D/44FFE570, desc: CHECKPOINT_SHUTDOWN redo
> B8D/44FFF6D0; tli 80; prev tli 80; fpw true; xid 0:42293547; oid 10741249;
> multi 1; offset 0; oldest xid 549 in DB 1; oldest multi 1 in DB 1;
> oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
> pg_waldump: FATAL: error in WAL record at B8D/44FFF6D0: invalid record
> length at B8D/44FFF740: wanted 24, got 0

The shutdown record is written at the end of crash recovery of
the master, then replicated. As the result the subsequent bytes
are inconsistent with the record.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Dmitry Dolgov 2019-08-06 11:34:51 Re: BUG #15940: json_populate_recordset fails with ERROR: record type has not been registered
Previous Message Michael Paquier 2019-08-06 07:32:09 Re: BUG #15940: json_populate_recordset fails with ERROR: record type has not been registered

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2019-08-06 08:21:10 Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Previous Message Andres Freund 2019-08-06 07:56:26 Re: POC: Cleaning up orphaned files using undo logs