Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: a(dot)lubennikova(at)postgrespro(dot)ru
Cc: apraveen(at)pivotal(dot)io, pguo(at)pivotal(dot)io, thomas(dot)munro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-09-12 08:35:36
Message-ID: 20190912.173536.149449894.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 11 Sep 2019 17:26:44 +0300, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> wrote in <a82a896b-93f0-c26c-b941-f5665131381b(at)postgrespro(dot)ru>
> 10.09.2019 14:42, Asim R P wrote:
> > Hi Anastasia
> >
> > On Thu, Aug 22, 2019 at 9:43 PM Anastasia Lubennikova
> > <a(dot)lubennikova(at)postgrespro(dot)ru <mailto:a(dot)lubennikova(at)postgrespro(dot)ru>>
> > wrote:
> > >
> > > But during the review, I found a bug in the current implementation.
> > > New behavior must apply to crash-recovery only, now it applies to
> > > archiveRecovery too.
> > > That can cause a silent loss of a tablespace during regular standby
> > > operation
> > > since it never calls CheckRecoveryConsistency().

Yeah. We should take the same steps with redo operations on
missing pages. Just record failure during inconsistent state then
forget it if underlying tablespace is gone. If we had a record
when we reached concsistency, we're in a serious situation and
should stop recovery. log_invalid_page forget_invalid_pages and
CheckRecoveryConsistency are the entry points of the feature to
understand.

> > > Steps to reproduce:
> > > 1) run master and replica
> > > 2) create dir for tablespace:
> > > mkdir  /tmp/tblspc1
> > >
> > > 3) create tablespace and database on the master:
> > > create tablespace tblspc1 location '/tmp/tblspc1';
> > > create database db1 tablespace tblspc1 ;
> > >
> > > 4) wait for replica to receive this changes and pause replication:
> > > select pg_wal_replay_pause();
> > >
> > > 5) move replica's tablespace symlink to some empty directory,
> > > i.e. /tmp/tblspc2
> > > mkdir  /tmp/tblspc2
> > > ln -sfn /tmp/tblspc2 postgresql_data_replica/pg_tblspc/16384
> > >
> >
> > By changing the tablespace symlink target, we are silently nullifying
> > effects of a committed transaction from the standby data directory -
> > the directory structure created by the standby for create tablespace
> > transaction.  This step, therefore, does not look like a valid test
> > case to me.  Can you share a sequence of steps that does not involve
> > changing data directory manually?

I see it as the same. WAL is inconsistent with what happend on
storage with the steps. Database is just broken.

> Hi, the whole idea of the test is to reproduce a data loss. For
> example, if the disk containing this tablespace failed.

So, apparently we must start recovery from a backup before that
failure happened in that case, and that should ends in success.

# I remember that the start point of this patch is a crash after
# table space drop subsequent to several operations within the
# table space. Then, crash recovery fails at an operation in the
# finally-removed tablespace. Is it right?

> Probably, simply deleting the directory
> 'postgresql_data_replica/pg_tblspc/16384'
> would work as well, though I was afraid that it can be caught by some
> earlier checks and my example won't be so illustrative.
> >
> > Thank you for the review feedback.  I agree with all the points.  Let
> > me incorporate them (I plan to pick this work up and drive it to
> > completion as Paul got busy with other things).
> >
> > But before that I'm revisiting another solution upthread, that of
> > creating restart points when replaying create/drop database commands
> > before making filesystem changes such as removing a directory.  The
> > restart points should align with checkpoints on master.  The concern
> > against this solution was creation of restart points will slow down
> > recovery.  I don't think crash recovery is affected by this solution
> > because of the already existing enforcement of checkpoints.  WAL
> > records prior to a create/drop database will not be seen by crash
> > recovery due to the checkpoint enforced during the command's normal
> > execution.
> >
>
> I haven't measured the impact of generating extra restart points in
> previous solution, so I cannot tell whether concerns upthread are
> justified.  Still, I enjoy latest design more, since it is clear and
> similar with the code of checking unexpected uninitialized pages. In
> principle it works. And the issue, I described in previous review can
> be easily fixed by several additional checks of InHotStandby macro.

Generally we shouldn't trigger useless restart point for
uncertain reasons. If standby crashes, it starts the next
recovery from the latest *restart point*. Even in that case what
we should do is the same.

Of course, for testing, we *should* establish a restartpoint
manually in order to establish the prerequisite state.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-09-12 08:45:06 Re: psql - improve test coverage from 41% to 88%
Previous Message vignesh C 2019-09-12 08:07:37 Re: psql - improve test coverage from 41% to 88%