Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pguo(at)pivotal(dot)io
Cc: apraveen(at)pivotal(dot)io, pgsql-hackers(at)postgresql(dot)org
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-04-22 07:40:27
Message-ID: 20190422.164027.33866403.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oops! The comment in the previous patch is wrong.

At Mon, 22 Apr 2019 16:15:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190422(dot)161513(dot)258021727(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <pguo(at)pivotal(dot)io> wrote in <CAEET0ZGpUrMGUzfyzVF9FuSq+zb=QovYa2cvyRnDOTvZ5vXxTw(at)mail(dot)gmail(dot)com>
> > Please see my replies inline. Thanks.
> >
> > On Fri, Apr 19, 2019 at 12:38 PM Asim R P <apraveen(at)pivotal(dot)io> wrote:
> >
> > > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <pguo(at)pivotal(dot)io> wrote:
> > > >
> > > > create db with tablespace
> > > > drop database
> > > > drop tablespace.
> > >
> > > Essentially, that sequence of operations causes crash recovery to fail
> > > if the "drop tablespace" transaction was committed before crashing.
> > > This is a bug in crash recovery in general and should be reproducible
> > > without configuring a standby. Is that right?
> > >
> >
> > No. In general, checkpoint is done for drop_db/create_db/drop_tablespace on
> > master.
> > That makes the file/directory update-to-date if I understand the related
> > code correctly.
> > For standby, checkpoint redo does not ensure that.
>
> That's right partly. As you must have seen, fast shutdown forces
> restartpoint for the last checkpoint and it prevents the problem
> from happening. Anyway it seems to be a problem.
>
> > > Your patch creates missing directories in the destination. Don't we
> > > need to create the tablespace symlink under pg_tblspc/? I would
> > >
> >
> > 'create db with tablespace' redo log does not include the tablespace real
> > directory information.
> > Yes, we could add in it into the xlog, but that seems to be an overdesign.
>
> But I don't think creating directory that is to be removed just
> after is a wanted solution. The directory most likely to be be
> removed just after.
>
> > > prefer extending the invalid page mechanism to deal with this, as
> > > suggested by Ashwin off-list. It will allow us to avoid creating
> >
> > directories and files only to remove them shortly afterwards when the
> > > drop database and drop tablespace records are replayed.
> > >
> > >
> > 'invalid page' mechanism seems to be more proper for missing pages of a
> > file. For
> > missing directories, we could, of course, hack to use that (e.g. reading
> > any page of
> > a relfile in that database) to make sure the tablespace create code
> > (without symlink)
> > safer (It assumes those directories will be deleted soon).
> >
> > More feedback about all of the previous discussed solutions is welcome.
>
> It doesn't seem to me that the invalid page mechanism is
> applicable in straightforward way, because it doesn't consider
> simple file copy.
>
> Drop failure is ignored any time. I suppose we can ignore the
> error to continue recovering as far as recovery have not reached
> consistency. The attached would work *at least* your case, but I
> haven't checked this covers all places where need the same
> treatment.

The comment for the new function XLogMakePGDirectory is wrong:

+ * There is a possibility that WAL replay causes a creation of the same
+ * directory left by the previous crash. Issuing ERROR prevents the caller
+ * from continuing recovery.

The correct one is:

+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.

It is fixed in the attached.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
ignore_dir_create_error_before_consistency_v2.patch text/x-patch 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-04-22 07:49:19 display of variables in EXPLAIN VERBOSE
Previous Message Konstantin Knizhnik 2019-04-22 07:38:18 Re: block-level incremental backup