From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | pguo(at)pivotal(dot)io |
Cc: | apraveen(at)pivotal(dot)io, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: standby recovery fails (tablespace related) (tentative patch and discussion) |
Date: | 2019-05-07 06:47:11 |
Message-ID: | 20190507.154711.254360772.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi.
At Tue, 30 Apr 2019 14:33:47 +0800, Paul Guo <pguo(at)pivotal(dot)io> wrote in <CAEET0ZGhmDKrq7JJu2rLLqcJBR8pA4OYrKsirZ5Ft8-deG1e8A(at)mail(dot)gmail(dot)com>
> I updated the original patch to
It's reasonable not to touch copydir.
> 1) skip copydir() if either src path or dst parent path is missing in
> dbase_redo(). Both missing cases seem to be possible. For the src path
> missing case, mkdir_p() is meaningless. It seems that moving the directory
> existence check step to dbase_redo() has less impact on other code.
Nice catch.
+ if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
+ {
This patch is allowing missing source and destination directory
even in consistent state. I don't think it is safe.
+ ereport(WARNING,
+ (errmsg("directory \"%s\" for copydir() does not exists."
+ "It is possibly expected. Skip copydir().",
+ parent_path)));
This message seems unfriendly to users, or it seems like an elog
message. How about something like this. The same can be said for
the source directory.
| WARNING: skipped creating database directory: "%s"
| DETAIL: The tabelspace %u may have been removed just before crash.
# I'm not confident in this at all:(
> 2) Fixed dbase_desc(). Now the xlog output looks correct.
>
> rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn:
> 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> pg_tblspc/16384/PG_12_201904281/16386
>
> rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn:
> 0/01638EB8, prev 0/01638E40, desc: DROP dir
> pg_tblspc/16384/PG_12_201904281/16386
WAL records don't convey such information. The previous
description seems right to me.
> I'm not familiar with the TAP test details previously. I learned a lot
> about how to test such case from Kyotaro's patch series.👍
Yeah, good to hear.
> On Sun, Apr 28, 2019 at 3:33 PM Paul Guo <pguo(at)pivotal(dot)io> wrote:
> > If we want to fix the issue by ignoring the dst path create failure, I do
> > not think we should do
> > that in copydir() since copydir() seems to be a common function. I'm not
> > sure whether it is
> > used by some extensions or not. If no maybe we should move the dst patch
> > create logic
> > out of copydir().
Agreed to this.
> > Also I'd suggest we should use pg_mkdir_p() in TablespaceCreateDbspace()
> > to replace
> > the code block includes a lot of
> > get_parent_directory(), MakePGDirectory(), etc even it
> > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > graceful and simpler.
But I don't agree to this. pg_mkdir_p goes above two-parents up,
which would be unwanted here.
> > Whatever ignore mkdir failure or mkdir_p, I found that these steps seem to
> > be error-prone
> > along with postgre evolving since they are hard to test and also we are
> > not easy to think out
> > various potential bad cases. Is it possible that we should do real
> > checkpoint (flush & update
> > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > slow down standby
> > but master also does this and also these operations are not usual,
> > espeically it seems that it
> > does not slow down wal receiving usually?
That dramatically slows recovery (not replication) if databases
are created and deleted frequently. That wouldn't be acceptable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2019-05-07 07:17:24 | Re: findTargetlistEntrySQL92() and COLLATE clause |
Previous Message | Amit Kapila | 2019-05-07 06:07:25 | Re: Unhappy about API changes in the no-fsm-for-small-rels patch |