From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Race conditions in 019_replslot_limit.pl |
Date: | 2022-02-17 02:14:44 |
Message-ID: | 20220217021444.wnaxfigeyvogyzi3@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-02-16 20:22:23 -0500, Tom Lane wrote:
> There's no disconnection log entry for either, which I suppose means
> that somebody didn't bother logging disconnection for walsenders ...
The thing is, we actually *do* log disconnection for walsenders:
2022-02-16 17:53:55.780 PST [2459806][walsender][:0] LOG: disconnection: session time: 0:00:01.321 user=andres database= host=[local]
The only way I see for the disconnection not to be logged is an immediate
shutdown.
It'd be one thing if the missing walsender disconnect wasn't logged because we
shut down just after. But unless I misunderstand something, between the
basebackup and the failing test, we actually start standby_3:
# Running: pg_basebackup -D /home/bf/build/buildfarm-serinus/HEAD/pgsql.build/src/test/recovery/tmp_check/t_019_replslot_limit_primary3_data/backup/my_backup -h /tmp/hyrtnyd2RU -p 53774 --checkpoint fast --no-sync
# Backup finished
...
# Running: pg_ctl -w -D /home/bf/build/buildfarm-serinus/HEAD/pgsql.build/src/test/recovery/tmp_check/t_019_replslot_limit_standby_3_data/pgdata -l /home/bf/build/buildfarm-serinus/HEAD/pgsql.build/src/test/recovery/tmp_check/log/019_replslot_limit_standby_3.log -o --cluster-name=standby_3 start
and also wait till replication has caught up:
Waiting for replication conn standby_3's replay_lsn to pass 0/700000 on primary3
done
and only then we have
not ok 17 - have walsender pid 3682154
# 3682136
# Failed test 'have walsender pid 3682154
# 3682136'
# at t/019_replslot_limit.pl line 335.
# '3682154
# 3682136'
# doesn't match '(?^:^[0-9]+$)'
Starting a node in recovery and having it connect to the primary seems like a
mighty long time for a process to exit, unless it's stuck behind something.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-02-17 02:17:52 | Re: killing perl2host |
Previous Message | Tom Lane | 2022-02-17 02:12:12 | Re: libpq async duplicate error results |