Re: Sketch of a fix for that truncation data corruption issue

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sketch of a fix for that truncation data corruption issue
Date: 2018-12-12 01:54:15
Message-ID: 20181212015415.5pphghl3buuz2hob@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-12-12 10:49:59 +0900, Robert Haas wrote:
> Just thinking about this a bit, the problem with truncating first and
> then writing the WAL record is that if the WAL record never makes it
> to disk, any physical standbys will end up out of sync with the
> master, leading to disaster. But the problem with writing the WAL
> record first is that the actual operation might fail, and then
> standbys will end up out of sync with the master, leading to disaster.
> The obvious way to finesse that latter problem is just PANIC if
> ftruncate() fails -- then we'll crash restart and retry, and if we
> still can't do it, well, the DBA will have to fix that before the
> system can come on line. I'm not sure that's really all that bad --
> if we can't truncate, we're kinda hosed. How, other than a
> permissions problem, does that even happen?

I think it's correct to panic in that situation. As you say it's really
unlikely for that to happen in normal circumstances (as long as we
handle obvious stuff like EINTR) - and added complexity to avoid it
seems very unlikely to be tested.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-12-12 02:05:36 Re: Remove Deprecated Exclusive Backup Mode
Previous Message Robert Haas 2018-12-12 01:49:59 Re: Sketch of a fix for that truncation data corruption issue