Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Paul Guo <pguo(at)pivotal(dot)io>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-06-19 07:21:54
Message-ID: CAEET0ZF-0hGtkZGufHaqDzg-Ez89n=1T6raWv-03SP1FtB7wRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 27, 2019 at 9:39 PM Paul Guo <pguo(at)pivotal(dot)io> wrote:

>
>
> On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>> Hello.
>>
>> At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <pguo(at)pivotal(dot)io> wrote in <
>> CAEET0ZF9yN4DaXyuFLzOcAYyxuFF1Ms_OQWeA+Rwv3GhA5Q-SA(at)mail(dot)gmail(dot)com>
>> > Thanks for the reply.
>> >
>> > On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
>> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >
>> > >
>> > > + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
>> > > + {
>> > >
>> > > This patch is allowing missing source and destination directory
>> > > even in consistent state. I don't think it is safe.
>> > >
>> >
>> > I do not understand this. Can you elaborate?
>>
>> Suppose we were recoverying based on a backup at LSN1 targeting
>> to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
>> is called as "consistency point", before where the database is
>> not consistent. It's because we are applying WAL recored older
>> than those that were already applied in the second trial. The
>> same can be said for crash recovery, where LSN1 is the latest
>> checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.
>>
>> Creation of an existing directory or dropping of a non-existent
>> directory are apparently inconsistent or "broken" so we should
>> stop recovery when seeing such WAL records while database is in
>> consistent state.
>>
>
> This seems to be hard to detect. I thought using invalid_page mechanism
> long ago,
> but it seems to be hard to fully detect a dropped tablespace.
>
> > > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
>> > > >
>> > > > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn:
>> > > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
>> > > > pg_tblspc/16384/PG_12_201904281/16386
>> > > >
>> > > > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn:
>> > > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
>> > > > pg_tblspc/16384/PG_12_201904281/16386
>> > >
>> > > WAL records don't convey such information. The previous
>> > > description seems right to me.
>> > >
>> >
>> > 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for
>> > Database/CREATE: copy dir 1663/1 to 65546/65547
>> > The directories are definitely wrong and misleading.
>>
>> The original description is right in the light of how the server
>> recognizes. The record exactly says that "copy dir 1663/1 to
>> 65546/65547" and the latter path is converted in filesystem layer
>> via a symlink.
>>
>
> In either $PG_DATA/pg_tblspc or symlinked real tablespace directory,
> there is an additional directory like PG_12_201905221 between
> tablespace oid and database oid. See the directory layout as below,
> so the directory info in xlog dump output was not correct.
>
> $ ls -lh data/pg_tblspc/
>
>
> total 0
>
>
> lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2
>
>
> $ ls -lh /tmp/2
>
>
> total 0
>
>
> drwx------. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221
>
>>
>>
>> > > > > Also I'd suggest we should use pg_mkdir_p() in
>> > > TablespaceCreateDbspace()
>> > > > > to replace
>> > > > > the code block includes a lot of
>> > > > > get_parent_directory(), MakePGDirectory(), etc even it
>> > > > > is not fixing a bug since pg_mkdir_p() code change seems to be
>> more
>> > > > > graceful and simpler.
>> > >
>> > > But I don't agree to this. pg_mkdir_p goes above two-parents up,
>> > > which would be unwanted here.
>> > >
>> > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
>> > This change just makes the code concise. Though in theory the change is
>> not
>> > needed.
>>
>> We don't want to create tablespace direcotory after concurrent
>> DROPing, as the comment just above is saying:
>>
>> | * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
>> | * or TablespaceCreateDbspace is running concurrently.
>>
>> If the concurrent DROP TABLESPACE destroyed the grand parent
>> directory, we mustn't create it again.
>>
>
> Yes, this is a good reason to keep the original code. Thanks.
>
> By the way, based on your previous test patch I added another test which
> could easily detect
> the missing src directory case.
>
>

I updated the patch to v3. In this version, we skip the error if copydir
fails due to missing src/dst directory,
but to make sure the ignoring is legal, I add a simple log/forget mechanism
(Using List) similar to the xlog invalid page
checking mechanism. Two tap tests are included. One is actually from a
previous patch by Kyotaro in this
email thread and another is added by me. In addition, dbase_desc() is fixed
to make the message accurate.

Thanks.

Attachment Content-Type Size
v3-0001-skip-copydir-if-either-src-directory-or-dst-direc.patch application/octet-stream 14.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2019-06-19 07:51:19 Re: Some reloptions non-initialized when loaded
Previous Message Amit Kapila 2019-06-19 06:45:32 Re: POC: Cleaning up orphaned files using undo logs