Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thunder <thunder1(at)126(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node
Date: 2019-09-27 06:14:14
Message-ID: 20190927061414.GF8485@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote:
> On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> This also points out that there are other things to worry about than
>> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
>> to an ERROR, and this happens before the physical truncation is done
>> but after the WAL record is replayed on the standby, so any failures
>> happening at the truncation phase before the work is done would be a
>> problem. However we are talking about failures which should not
>> happen and these are elog() calls. It would be tempting to add a
>> critical section here, but we could still have problems if we have a
>> failure after the WAL record has been flushed, which means that it
>> would be replayed on the standby, and the surrounding comments are
>> clear about that.
>
> Could you elaborate what problem adding a critical section there occurs?

Wrapping the call of smgrtruncate() within RelationTruncate() to use a
critical section would make things worse from the user perspective on
the primary, no? If the physical truncation fails, we would still
fail WAL replay on the standby, but instead of generating an ERROR in
the session of the user attempting the TRUNCATE, the whole primary
would be taken down.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-09-27 07:02:00 Re: recovery starting when backup_label exists, but not recovery.signal
Previous Message Amit Kapila 2019-09-27 05:48:45 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions