Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Asim R P <apraveen(at)pivotal(dot)io>
Cc: Paul Guo <pguo(at)pivotal(dot)io>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-09-11 14:26:44
Message-ID: a82a896b-93f0-c26c-b941-f5665131381b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

10.09.2019 14:42, Asim R P wrote:
> Hi Anastasia
>
> On Thu, Aug 22, 2019 at 9:43 PM Anastasia Lubennikova
> <a(dot)lubennikova(at)postgrespro(dot)ru <mailto:a(dot)lubennikova(at)postgrespro(dot)ru>>
> wrote:
> >
> > But during the review, I found a bug in the current implementation.
> > New behavior must apply to crash-recovery only, now it applies to
> archiveRecovery too.
> > That can cause a silent loss of a tablespace during regular standby
> operation
> > since it never calls CheckRecoveryConsistency().
> >
> > Steps to reproduce:
> > 1) run master and replica
> > 2) create dir for tablespace:
> > mkdir  /tmp/tblspc1
> >
> > 3) create tablespace and database on the master:
> > create tablespace tblspc1 location '/tmp/tblspc1';
> > create database db1 tablespace tblspc1 ;
> >
> > 4) wait for replica to receive this changes and pause replication:
> > select pg_wal_replay_pause();
> >
> > 5) move replica's tablespace symlink to some empty directory, i.e.
> /tmp/tblspc2
> > mkdir  /tmp/tblspc2
> > ln -sfn /tmp/tblspc2 postgresql_data_replica/pg_tblspc/16384
> >
>
> By changing the tablespace symlink target, we are silently nullifying
> effects of a committed transaction from the standby data directory -
> the directory structure created by the standby for create tablespace
> transaction.  This step, therefore, does not look like a valid test
> case to me.  Can you share a sequence of steps that does not involve
> changing data directory manually?
>
Hi, the whole idea of the test is to reproduce a data loss. For example,
if the disk containing this tablespace failed.
Probably, simply deleting the directory
'postgresql_data_replica/pg_tblspc/16384'
would work as well, though I was afraid that it can be caught by some
earlier checks and my example won't be so illustrative.
>
> Thank you for the review feedback.  I agree with all the points.  Let
> me incorporate them (I plan to pick this work up and drive it to
> completion as Paul got busy with other things).
>
> But before that I'm revisiting another solution upthread, that of
> creating restart points when replaying create/drop database commands
> before making filesystem changes such as removing a directory.  The
> restart points should align with checkpoints on master.  The concern
> against this solution was creation of restart points will slow down
> recovery.  I don't think crash recovery is affected by this solution
> because of the already existing enforcement of checkpoints.  WAL
> records prior to a create/drop database will not be seen by crash
> recovery due to the checkpoint enforced during the command's normal
> execution.
>

I haven't measured the impact of generating extra restart points in
previous solution, so I cannot tell whether concerns upthread are
justified.  Still, I enjoy latest design more, since it is clear and
similar with the code of checking unexpected uninitialized pages. In
principle it works. And the issue, I described in previous review can be
easily fixed by several additional checks of InHotStandby macro.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2019-09-11 14:53:16 Create collation reporting the ICU locale display name
Previous Message Fabien COELHO 2019-09-11 14:25:07 Re: [patch]socket_timeout in interfaces/libpq