| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru> |
| Cc: | Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ronan Dunklau <ronan(at)dunklau(dot)fr>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | Re: Exit walsender before confirming remote flush in logical replication |
| Date: | 2026-03-25 12:39:00 |
| Message-ID: | CAHGQGwHZnhodg7+xQZQOoRhX+emPUoPPgKZm7fOYOH3DpxxC8g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Mar 16, 2026 at 5:52 AM Andrey Silitskiy
<a(dot)silitskiy(at)postgrespro(dot)ru> wrote:
>
> On Fri, 13 Mar 2026 Greg Sabino Mullane <htamfids(at)gmail(dot)com> wrote:
> > +1. I don't think we need to measure any times, but we do need to
> > exercise that whole part of the code, ...
>
> Added a case with a positive but small wal_sender_shutdown_timeout
> to the test.
Thanks for updating the patch!
I tested wal_sender_shutdown_timeout under several configurations and
encountered a case where the primary shutdown got stuck, even with the patch
and wal_sender_shutdown_timeout = 1. I'm not sure yet whether this is a bug in
the patch or an issue with my test setup, but anyway I'd like to share
the reproduction steps for reference.
--------------------------------------------------------------------------------
#1. Set up primary, standby (with slot sync), and subscriber
initdb -D data --encoding=UTF8 --locale=C
cat <<EOF >> data/postgresql.conf
wal_level = logical
synchronized_standby_slots = 'physical_slot'
wal_sender_timeout = 1h
wal_sender_shutdown_timeout = 1
log_line_prefix = '%t %p [%b] data '
EOF
pg_ctl -D data start
pg_receivewal --create-slot -S physical_slot
pg_basebackup -D sby1 -c fast -R -S physical_slot -d "dbname=postgres"
cat <<EOF >> sby1/postgresql.conf
port = 5433
sync_replication_slots = on
hot_standby_feedback = on
log_line_prefix = '%t %p [%b] sby1 '
EOF
pg_ctl -D sby1 start
initdb -D sub1 --encoding=UTF8 --locale=C
cat <<EOF >> sub1/postgresql.conf
port = 5434
wal_level = logical
log_line_prefix = '%t %p [%b] sub1 '
EOF
pg_ctl -D sub1 start
#2. Create table, publication, and subscription
psql -p 5432 <<EOF
create table t (i int primary key, j int);
create publication mypub for table t;
EOF
psql -p 5434 <<EOF
create table t (i int primary key, j int);
create subscription mysub connection 'port=5432' publication mypub
with (failover = 'on');
EOF
#3. Lock the table on the subscriber
psql -p 5434 <<EOF
begin;
lock table t in access exclusive mode;
select pg_sleep(1000);
EOF
#4. Confirm the apply worker is waiting
psql -p 5432 -c "insert into t values(1, 0)"
psql -p 5434 -c "select * from pg_stat_activity where backend_type
like '%apply worker%'"
#5. Block walreceiver on the standby by using SIGSTOP signal
kill -SIGSTOP $(psql -p 5433 -X -Atc "SELECT pid FROM pg_stat_wal_receiver")
#6. Insert another row and shut down the primary
psql -p 5432 -c "insert into t values(2, 0)"
pg_ctl -D data stop
--------------------------------------------------------------------------------
In my tests, the shutdown in step #6 got stuck.
Any thoughts on whether this indicates a problem in the patch or
something off in my setup?
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-03-25 13:13:49 | Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions |
| Previous Message | Andrey Borodin | 2026-03-25 12:29:52 | Re: Two issues leading to discrepancies in FSM data on the standby server |