Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thunder <thunder1(at)126(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node
Date: 2019-09-25 16:13:56
Message-ID: CAHGQGwGfEB-+pwWi+ut_fH1qV7v75dRbB2DhJVg9O6ZC-bx-Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> > On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
> >> Is this an issue?
> >> Can we fix like this?
> >> Thanks!
> >>
> >
> > I do think it is a valid issue. No opinion on the fix yet, though.
> > The report was sent on saturday, so patience ;-)
>
> And for some others it was even a longer weekend. Anyway, the problem
> can be reproduced if you apply the attached which introduces a failure
> point, and then if you run the following commands:
> create table aa as select 1;
> delete from aa;
> \! touch /tmp/truncate_flag
> vacuum aa;
> \! rm /tmp/truncate_flag
> vacuum aa; -- panic on standby
>
> This also points out that there are other things to worry about than
> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> to an ERROR, and this happens before the physical truncation is done
> but after the WAL record is replayed on the standby, so any failures
> happening at the truncation phase before the work is done would be a
> problem. However we are talking about failures which should not
> happen and these are elog() calls. It would be tempting to add a
> critical section here, but we could still have problems if we have a
> failure after the WAL record has been flushed, which means that it
> would be replayed on the standby, and the surrounding comments are
> clear about that.

Could you elaborate what problem adding a critical section there occurs?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2019-09-25 17:28:36 Re: PostgreSQL 12 RC1 Press Release Draft
Previous Message Tom Lane 2019-09-25 15:53:27 Re: Proposal for syntax to support creation of partition tables when creating parent table