Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Asim R P <apraveen(at)pivotal(dot)io>
To: Paul Guo <pguo(at)pivotal(dot)io>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Alexandra Wang <leiwang(at)pivotal(dot)io>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-09-19 11:59:59
Message-ID: CANXE4TdHyvpOyHVnw=goiazD7d4CiR7=MxtsvSUDphmasSVzvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 22, 2019 at 6:44 PM Paul Guo <pguo(at)pivotal(dot)io> wrote:
>
> Thanks. I updated the patch to v5. It passes install-check testing and
recovery testing.
>

This patch contains one more bug, in addition to what Anastasia has found.
If the test case in the patch is tweaked slightly, as follows, the standby
crashes due to PANIC.

--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -147,8 +147,6 @@ $node_standby->start;
$node_master->poll_query_until(
'postgres', 'SELECT count(*) = 1 FROM pg_stat_replication');

-$node_master->safe_psql('postgres', "CREATE DATABASE db1 TABLESPACE ts1");
-
# Make sure to perform restartpoint after tablespace creation
$node_master->wait_for_catchup($node_standby, 'replay',

$node_master->lsn('replay'));
@@ -156,7 +154,8 @@ $node_standby->safe_psql('postgres', 'CHECKPOINT');

# Do immediate shutdown ...
$node_master->safe_psql('postgres',
- q[ALTER DATABASE db1 SET
TABLESPACE ts2;
+ q[CREATE DATABASE db1
TABLESPACE ts1;
+ ALTER DATABASE db1 SET
TABLESPACE ts2;
DROP TABLESPACE ts1;]);
$node_master->wait_for_catchup($node_standby, 'replay',

$node_master->lsn('replay'));

Notice the create additional create database in the above change. That
causes the same tablespace directory (ts1) logged twice in the list of
missing directories. At the end of crash recovery, there is one unmatched
entry in the missing dirs list and the standby PANICs.

Please find attached a couple of tests that are built on top of what was
already written by Paul, Kyotaro. The patch includes a test to demonstrate
the above mentioned failure and a test case that my friend Alexandra wrote
to implement the archive recovery scenario noted by Anastasia.

In order to fix the test failures, we need to distinguish between a missing
database directory and a missing tablespace directory. And also add logic
to forget missing directories during tablespace drop. I am working on it.

Asim

Attachment Content-Type Size
0001-Tests-for-replay-of-create-database-operation-on-sta.patch application/octet-stream 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-09-19 12:52:32 Re: FETCH FIRST clause PERCENT option
Previous Message vignesh C 2019-09-19 11:51:48 Re: dropdb --force