Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Paul Guo <guopa(at)vmware(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Asim Praveen (Pivotal)" <apraveen(at)pivotal(dot)io>, "Lei Wang (Pivotal)" <leiwang(at)pivotal(dot)io>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2021-01-27 08:36:22
Message-ID: 29C02691-6F05-422F-AB13-2B004D60F4CF@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review, please see the replies below.

> On Jan 5, 2021, at 9:07 AM, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 8 Jul 2020 12:56:44 +0000, Paul Guo <guopa(at)vmware(dot)com> wrote in
>> On 2020/01/15 19:18, Paul Guo wrote:
>> I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result.
>>
>> Thanks for updating the patches!
>>
>> I started reading the 0003 patch.
>>
>> The approach that the 0003 patch uses is not the perfect solution.
>> If the standby crashes after tblspc_redo() removes the directory and before
>> its subsequent COMMIT record is replayed, PANIC error would occur since
>> there can be some unresolved missing directory entries when we reach the
>> consistent state. The problem would very rarely happen, though...
>> Just idea; calling XLogFlush() to update the minimum recovery point just
>> before tblspc_redo() performs destroy_tablespace_directories() may be
>> safe and helpful for the problem?
>
> It seems to me that what the current patch does is too complex. What
> we need to do here is to remember every invalid operation then forget
> it when the prerequisite object is dropped.
>
> When a table space is dropped before consistency is established, we
> don't need to care what has been performed inside the tablespace. In
> this perspective, it is enough to remember tablespace ids when failed
> to do something inside it due to the absence of the tablespace and
> then forget it when we remove it. We could remember individual
> database id to show them in error messages, but I'm not sure it's
> useful. The reason log_invalid_page records block numbers is to allow
> the machinery handle partial table truncations, but this is not the
> case since dropping tablespace cannot leave some of containing
> databases.
>
> As the result, we won't see an unresolved invalid operations in a
> dropped tablespace.
>
> Am I missing something?

Yes, removing the database id from the hash key in the log/forget code should
be usually fine, but the previous code does stricter/safer checking.

Consider the scenario:

CREATE DATABASE newdb1 TEMPLATE template_db1;
CREATE DATABASE newdb2 TEMPLATE template_db2; <- in case the template_db2 database directory is missing abnormally somehow.
DROP DATABASE template_db1;

The previous code could detect this but if we remove the database id in the code,
this bad scenario is skipped.

>
>
> dbase_redo:
> + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> + {
> + XLogRecordMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
>
> This means "record the belonging table space directory if it is not
> found OR it is not a directory". The former can be valid but the
> latter is unconditionally can not (I don't think we bother considering
> symlinks there).

Again this is a safer check, in the case the parent_path is a file for example somehow,
we should panic finally for the case and let the user checks and then does recovery again.

>
> + /*
> + * Source directory may be missing. E.g. the template database used
> + * for creating this database may have been dropped, due to reasons
> + * noted above. Moving a database from one tablespace may also be a
> + * partner in the crime.
> + */
> + if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
> + {
> + XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
>
> This is a part of *creation* of the target directory. Lack of the
> source directory cannot be valid even if the source directory is
> dropped afterwards in the WAL stream and we can allow that if the
> *target* tablespace is dropped afterwards. As the result, as I
> mentioned above, we don't need to record about the database directory.
>
> By the way the name XLogLogMiss.. is somewhat confusing. How about
> XLogReportMissingDir (named after report_invalid_page).

Agree with you.

Also your words remind me that we should skip the checking if the consistency point
is reached.

Here is a git diff against the previous patch. I’ll send out the new rebased patches after
the consensus is reached.

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7ade385965..c8fe3fe228 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -90,7 +90,7 @@ typedef struct xl_missing_dir
static HTAB *missing_dir_tab = NULL;

void
-XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path)
+XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path)
{
xl_missing_dir_key key;
bool found;
@@ -103,16 +103,6 @@ XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path)
*/
Assert(OidIsValid(spcNode));

- if (reachedConsistency)
- {
- if (dbNode == InvalidOid)
- elog(PANIC, "cannot find directory %s (tablespace %d)",
- path, spcNode);
- else
- elog(PANIC, "cannot find directory %s (tablespace %d database %d)",
- path, spcNode, dbNode);
- }
-
if (missing_dir_tab == NULL)
{
/* create hash table when first needed */
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index fbff422c3b..7bd6d4efd9 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -2205,7 +2205,7 @@ dbase_redo(XLogReaderState *record)
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dst_path)));
}
- else
+ else if (!reachedConsistency)
{
/*
* It is possible that drop tablespace record appearing later in
@@ -2221,7 +2221,7 @@ dbase_redo(XLogReaderState *record)
get_parent_directory(parent_path);
if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
{
- XLogLogMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
+ XLogReportMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
skip = true;
ereport(WARNING,
(errmsg("skipping create database WAL record"),
@@ -2239,9 +2239,10 @@ dbase_redo(XLogReaderState *record)
* noted above. Moving a database from one tablespace may also be a
* partner in the crime.
*/
- if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
+ if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)) &&
+ !reachedConsistency)
{
- XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
+ XLogReportMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
skip = true;
ereport(WARNING,
(errmsg("skipping create database WAL record"),
@@ -2311,7 +2312,8 @@ dbase_redo(XLogReaderState *record)
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dst_path)));

- XLogForgetMissingDir(xlrec->tablespace_ids[i], xlrec->db_id);
+ if (!reachedConsistency)
+ XLogForgetMissingDir(xlrec->tablespace_ids[i], xlrec->db_id);

pfree(dst_path);
}
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 294c9676b4..15eaa757cc 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1534,7 +1534,8 @@ tblspc_redo(XLogReaderState *record)
{
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);

- XLogForgetMissingDir(xlrec->ts_id, InvalidOid);
+ if (!reachedConsistency)
+ XLogForgetMissingDir(xlrec->ts_id, InvalidOid);

XLogFlush(record->EndRecPtr);

diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index da561af5ab..6561d9cebe 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -23,7 +23,7 @@ extern void XLogDropDatabase(Oid dbid);
extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
BlockNumber nblocks);

-extern void XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path);
+extern void XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path);
extern void XLogForgetMissingDir(Oid spcNode, Oid dbNode);
extern void XLogCheckMissingDirs(void);

diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl
index 748200ebb5..95eb6d26cc 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -141,7 +141,7 @@ $node_master->wait_for_catchup($node_standby, 'replay',
$node_standby->safe_psql('postgres', 'CHECKPOINT');

# Do immediate shutdown just after a sequence of CREAT DATABASE / DROP
-# DATABASE / DROP TABLESPACE. This causes CREATE DATBASE WAL records
+# DATABASE / DROP TABLESPACE. This causes CREATE DATABASE WAL records

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-01-27 08:58:22 Re: Is Recovery actually paused?
Previous Message Yugo NAGATA 2021-01-27 08:34:43 Re: Is Recovery actually paused?