Re: snapper and skink and fairywren (oh my!)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: andrew(at)dunslane(dot)net
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: snapper and skink and fairywren (oh my!)
Date: 2022-01-27 02:55:47
Message-ID: 20220127.115547.376581766206303405.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 26 Jan 2022 18:45:53 -0500, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote in
> It's very unlikely any of this is your fault. In any case, intermittent
> failures are very hard to nail down.

The primary starts at 2022-01-26 16:30:06.613 the accepted a connectin
from the standby at 2022-01-26 16:30:09.911.

P: 2022-01-26 16:30:06.613 UTC [74668:1] LOG: starting PostgreSQL 15devel on x86_64-w64-mingw32, compiled by gcc.exe (Rev2, Built by MSYS2 project) 10.3.0, 64-bit
S: 2022-01-26 16:30:09.637 UTC [72728:1] LOG: starting PostgreSQL 15devel on x86_64-w64-mingw32, compiled by gcc.exe (Rev2, Built by MSYS2 project) 10.3.0, 64-bit
P: 2022-01-26 16:30:09.911 UTC [74096:3] [unknown] LOG: replication connection authorized: user=pgrunner application_name=standby
S: 2022-01-26 16:30:09.912 UTC [73932:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1

After this the primary restarts.

P: 2022-01-26 16:30:11.832 UTC [74668:7] LOG: database system is shut down
P: 2022-01-26 16:30:12.010 UTC [72140:1] LOG: starting PostgreSQL 15devel on x86_64-w64-mingw32, compiled by gcc.exe (Rev2, Built by MSYS2 project) 10.3.0, 64-bit

But the standby doesn't notice the disconnection and continue with the
poll_query_until() on the stale connection. But the query should have
executed after reconnection finally. The following log lines are not
thinned out.

S: 2022-01-26 16:30:09.912 UTC [73932:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
S: 2022-01-26 16:30:12.825 UTC [72760:1] [unknown] LOG: connection received: host=127.0.0.1 port=60769
S: 2022-01-26 16:30:12.830 UTC [72760:2] [unknown] LOG: connection authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner" method=sspi (C:/tools/msys64/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/modules/commit_ts/tmp_check/t_003_standby_2_standby_data/pgdata/pg_hba.conf:2)
S: 2022-01-26 16:30:12.830 UTC [72760:3] [unknown] LOG: connection authorized: user=pgrunner database=postgres application_name=003_standby_2.pl
S: 2022-01-26 16:30:12.838 UTC [72760:4] 003_standby_2.pl LOG: statement: SELECT '0/303C7D0'::pg_lsn <= pg_last_wal_replay_lsn()

Since the standby dones't receive WAL from the restarted server so the
poll_query_until() doesn't return.

I'm not sure why walsender of the standby continued running not knowing the primary has been once dead for such a long time.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-01-27 03:45:10 Re: pg_log_backend_memory_contexts() and log level
Previous Message Imseih (AWS), Sami 2022-01-27 02:15:16 Re: Add index scan progress to pg_stat_progress_vacuum