Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Paul Guo <pguo(at)pivotal(dot)io>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-05-27 13:39:03
Message-ID: CAEET0ZHetF9TD=e=BBuwBN1pSd+D_Geh5hLiK1p=H_yQw7rJWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hello.
>
> At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <pguo(at)pivotal(dot)io> wrote in <
> CAEET0ZF9yN4DaXyuFLzOcAYyxuFF1Ms_OQWeA+Rwv3GhA5Q-SA(at)mail(dot)gmail(dot)com>
> > Thanks for the reply.
> >
> > On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > >
> > > + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> > > + {
> > >
> > > This patch is allowing missing source and destination directory
> > > even in consistent state. I don't think it is safe.
> > >
> >
> > I do not understand this. Can you elaborate?
>
> Suppose we were recoverying based on a backup at LSN1 targeting
> to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
> is called as "consistency point", before where the database is
> not consistent. It's because we are applying WAL recored older
> than those that were already applied in the second trial. The
> same can be said for crash recovery, where LSN1 is the latest
> checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.
>
> Creation of an existing directory or dropping of a non-existent
> directory are apparently inconsistent or "broken" so we should
> stop recovery when seeing such WAL records while database is in
> consistent state.
>

This seems to be hard to detect. I thought using invalid_page mechanism
long ago,
but it seems to be hard to fully detect a dropped tablespace.

> > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> > > >
> > > > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn:
> > > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > > > pg_tblspc/16384/PG_12_201904281/16386
> > > >
> > > > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn:
> > > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > > > pg_tblspc/16384/PG_12_201904281/16386
> > >
> > > WAL records don't convey such information. The previous
> > > description seems right to me.
> > >
> >
> > 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for
> > Database/CREATE: copy dir 1663/1 to 65546/65547
> > The directories are definitely wrong and misleading.
>
> The original description is right in the light of how the server
> recognizes. The record exactly says that "copy dir 1663/1 to
> 65546/65547" and the latter path is converted in filesystem layer
> via a symlink.
>

In either $PG_DATA/pg_tblspc or symlinked real tablespace directory,
there is an additional directory like PG_12_201905221 between
tablespace oid and database oid. See the directory layout as below,
so the directory info in xlog dump output was not correct.

$ ls -lh data/pg_tblspc/

total 0

lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2

$ ls -lh /tmp/2

total 0

drwx------. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221

>
>
> > > > > Also I'd suggest we should use pg_mkdir_p() in
> > > TablespaceCreateDbspace()
> > > > > to replace
> > > > > the code block includes a lot of
> > > > > get_parent_directory(), MakePGDirectory(), etc even it
> > > > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > > > graceful and simpler.
> > >
> > > But I don't agree to this. pg_mkdir_p goes above two-parents up,
> > > which would be unwanted here.
> > >
> > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
> > This change just makes the code concise. Though in theory the change is
> not
> > needed.
>
> We don't want to create tablespace direcotory after concurrent
> DROPing, as the comment just above is saying:
>
> | * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
> | * or TablespaceCreateDbspace is running concurrently.
>
> If the concurrent DROP TABLESPACE destroyed the grand parent
> directory, we mustn't create it again.
>

Yes, this is a good reason to keep the original code. Thanks.

By the way, based on your previous test patch I added another test which
could easily detect
the missing src directory case.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-05-27 13:56:02 Re: Minimal logical decoding on standbys
Previous Message Tom Lane 2019-05-27 13:02:17 Re: BEFORE UPDATE trigger on postgres_fdw table not work