Quick Links

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From:	Paul Guo <pguo(at)pivotal(dot)io>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date:	2019-05-13 09:37:50
Message-ID:	CAEET0ZF9yN4DaXyuFLzOcAYyxuFF1Ms_OQWeA+Rwv3GhA5Q-SA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for the reply.

On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

>
> + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> + {
>
> This patch is allowing missing source and destination directory
> even in consistent state. I don't think it is safe.
>

I do not understand this. Can you elaborate?

>
>
>
> + ereport(WARNING,
> + (errmsg("directory \"%s\" for copydir() does not exists."
> + "It is possibly expected. Skip copydir().",
> + parent_path)));
>
> This message seems unfriendly to users, or it seems like an elog
> message. How about something like this. The same can be said for
> the source directory.
>
> | WARNING: skipped creating database directory: "%s"
> | DETAIL: The tabelspace %u may have been removed just before crash.
>

Yeah. Looks better.

>
> # I'm not confident in this at all:(
>
> > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> >
> > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn:
> > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > pg_tblspc/16384/PG_12_201904281/16386
> >
> > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn:
> > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > pg_tblspc/16384/PG_12_201904281/16386
>
> WAL records don't convey such information. The previous
> description seems right to me.
>

2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for
Database/CREATE: copy dir 1663/1 to 65546/65547
The directories are definitely wrong and misleading.

> > > Also I'd suggest we should use pg_mkdir_p() in
> TablespaceCreateDbspace()
> > > to replace
> > > the code block includes a lot of
> > > get_parent_directory(), MakePGDirectory(), etc even it
> > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > graceful and simpler.
>
> But I don't agree to this. pg_mkdir_p goes above two-parents up,
> which would be unwanted here.
>
> I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
This change just makes the code concise. Though in theory the change is not
needed.

> > > Whatever ignore mkdir failure or mkdir_p, I found that these steps
> seem to
> > > be error-prone
> > > along with postgre evolving since they are hard to test and also we are
> > > not easy to think out
> > > various potential bad cases. Is it possible that we should do real
> > > checkpoint (flush & update
> > > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > > slow down standby
> > > but master also does this and also these operations are not usual,
> > > espeically it seems that it
> > > does not slow down wal receiving usually?
>
> That dramatically slows recovery (not replication) if databases
> are created and deleted frequently. That wouldn't be acceptable.
>

This behavior is rare and seems to have the same impact on master & standby
from checkpoint/restartpoint.
We do not worry about master so we should not worry about standby also.

In response to

Re: standby recovery fails (tablespace related) (tentative patch and discussion) at 2019-05-07 06:47:11 from Kyotaro HORIGUCHI

Responses

Re: standby recovery fails (tablespace related) (tentative patch and discussion) at 2019-05-14 03:06:13 from Kyotaro HORIGUCHI

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2019-05-13 10:28:25	Re: vacuumdb and new VACUUM options
Previous Message	Oleg Bartunov	2019-05-13 09:14:48	Re: PostgreSQL 12: Feature Highlights