Re: Corruption during WAL replay

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, tejeswarm(at)hotmail(dot)com, pgsql-hackers(at)postgresql(dot)org, hexexpert(at)comcast(dot)net
Subject: Re: Corruption during WAL replay
Date: 2020-04-11 00:49:05
Message-ID: 20200411004905.GA12834@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-Mar-30, Andres Freund wrote:

> If we are really concerned with truncation failing - I don't know why we
> would be, we accept that we have to be able to modify files etc to stay
> up - we can add a pre-check ensuring that permissions are set up
> appropriately to allow us to truncate.

I remember I saw a case where the datadir was NFS or some other network
filesystem thingy, and it lost connection just before autovacuum
truncation, or something like that -- so there was no permission
failure, but the truncate failed and yet PG soldiered on. I think the
connection was re-established soon thereafter and things went back to
"normal", with nobody realizing that a truncate had been lost.
Corruption was discovered a long time afterwards IIRC (weeks or months,
I don't remember).

I didn't review Teja's patch carefully, but the idea of panicking on
failure (causing WAL replay) seems better than the current behavior.
I'd rather put the server to wait until storage is really back.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-11 00:54:31 Re: Corruption during WAL replay
Previous Message Andres Freund 2020-04-11 00:41:18 Re: pg_basebackup, manifests and backends older than ~12