Re: Race condition in recovery?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-06-08 08:47:06
Message-ID: CAFiTN-vWAseUExK=j-pBK2wR1phHOQ_Uc=0HND=p5SdNT+WC9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 8, 2021 at 11:13 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> # Wait until the node exits recovery.
> $standby->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
> or die "Timed out while waiting for promotion";
>
> I will try to generate a version for 9.6 based on this idea and see how it goes

I have changed for as per 9.6 but I am seeing some crash (both
with/without fix), I could not figure out the reason, it did not
generate any core dump, although I changed pg_ctl in PostgresNode.pm
to use "-c" so that it can generate core but it did not generate any
core file.

This is log from cascading node (025_stuck_on_old_timeline_cascade.log)
-------------
cp: cannot stat
‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/data_primary_52dW/archives/000000010000000000000003’:
No such file or directory
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
FATAL: could not receive database system identifier and timeline ID
from the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
--------------

The attached logs are when I ran without a fix.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
9.6-v6-0001-Fix-corner-case-failure-of-new-standby-to-follow-.patch text/x-patch 6.7 KB
025_stuck_on_old_timeline_cascade.log text/x-log 3.2 KB
025_stuck_on_old_timeline_primary.log text/x-log 708 bytes
025_stuck_on_old_timeline_standby.log text/x-log 1.8 KB
regress_log_025_stuck_on_old_timeline application/octet-stream 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-06-08 09:12:31 RE: Parallel INSERT SELECT take 2
Previous Message tsunakawa.takay@fujitsu.com 2021-06-08 08:45:24 RE: Transactions involving multiple postgres foreign servers, take 2