BUG #15617: walsender hang if sync replica disconnected from network

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: efimkin(at)yandex-team(dot)ru
Subject: BUG #15617: walsender hang if sync replica disconnected from network
Date: 2019-02-01 11:45:11
Message-ID: 15617-8dfbde784d8e3258@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15617
Logged by: Evgeny Efimkin
Email address: efimkin(at)yandex-team(dot)ru
PostgreSQL version: 10.6
Operating system: Ubuntu 14.04.5 LTS (GNU/Linux 4.4.52-25 x86_64)
Description:

I have a very rare problem when wal_sender_timeout didn't work, if sync
replica have network problem (disconnected). I can't reproduce it test
env.

Some diagnostic

xdb41f/postgres M # select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg14.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)

Time: 0.214 ms

xdb41f/postgres M # show wal_sender_timeout ;
wal_sender_timeout
--------------------
1min
(1 row)

pg_stat_activity:
xdb41f/postgres M # select * from pg_stat_activity where pid=556566;
-[ RECORD 1 ]----+----------------------------------
datid | [null]
datname | [null]
pid | 556566
usesysid | 16403
usename | repl
application_name | xdb41e_mail_yandex_net
client_addr | 2a02:6b8:0:801:ec4:7aff:fe52:cd5e
client_hostname | xdb41e.mail.yandex.net
client_port | 39132
backend_start | 2018-12-13 03:25:16.891091+03
xact_start | [null]
query_start | [null]
state_change | 2018-12-13 03:25:16.906387+03
wait_event_type | Client
wait_event | ClientWrite
state | active
backend_xid | [null]
backend_xmin | [null]
query |
backend_type | walsender

backtrace on walsender process:
Wed Jan 30 18:20:10 2019
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fd1b38bf6b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:81
#0 0x00007fd1b38bf6b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1 0x000055b85a26f4d9 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffce744cfd0, cur_timeout=-1, set=0x55b85c5e3d98) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/storage/ipc/latch.c:1048

#2 WaitEventSetWait (set=0x55b85c5e3d98, timeout=timeout(at)entry=-1,
occurred_events=occurred_events(at)entry=0x7ffce744cfd0,
nevents=nevents(at)entry=1, wait_event_info=wait_event_info(at)entry=100663297) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/storage/ipc/latch.c:1000

#3 0x000055b85a18da5f in secure_write (port=0x55b85c6380a0,
ptr=ptr(at)entry=0x7fd1b5cb0048, len=len(at)entry=131102) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/libpq/be-secure.c:273
#4 0x000055b85a198cc6 in internal_flush () at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/libpq/pqcomm.c:1433
#5 0x000055b85a198ee1 in internal_putbytes (s=s(at)entry=0x7ffce744d0ac "S",
len=len(at)entry=1) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/libpq/pqcomm.c:1379
#6 0x000055b85a199002 in socket_putmessage (msgtype=83 'S',
s=0x55b85c65ebd8 "client_encoding", len=26) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/libpq/pqcomm.c:1566
#7 0x000055b85a19b024 in pq_endmessage (buf=buf(at)entry=0x7ffce744d100) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/libpq/pqformat.c:347
#8 0x000055b85a3b22a9 in ReportGUCOption (record=0x55b85a80e7f8
<ConfigureNamesString+152>) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/utils/misc/guc.c:5408
#9 0x000055b85a3b39d8 in set_config_option (name=name(at)entry=0x55b85a5389ef
"client_encoding", value=0x55b85a52a4a2 "SQL_ASCII",
context=context(at)entry=PGC_BACKEND,
source=source(at)entry=PGC_S_DYNAMIC_DEFAULT,
action=action(at)entry=GUC_ACTION_SET, changeVal=changeVal(at)entry=1 '\001',
elevel=<optimized out>, elevel(at)entry=0, is_reload=is_reload(at)entry=0 '\000')
at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/utils/misc/guc.c:6638
#10 0x000055b85a3b4275 in SetConfigOption (name=name(at)entry=0x55b85a5389ef
"client_encoding", value=<optimized out>, context=context(at)entry=PGC_BACKEND,
source=source(at)entry=PGC_S_DYNAMIC_DEFAULT) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/utils/misc/guc.c:6685
#11 0x000055b85a3b94f0 in ProcessConfigFileInternal
(context=context(at)entry=PGC_SIGHUP, applySettings=applySettings(at)entry=1
'\001', elevel=elevel(at)entry=13) at guc-file.l:411
#12 0x000055b85a3b9974 in ProcessConfigFile (context= ) at
guc-file.l:155
#13 0x000055b85a247cd0 in WalSndLoop
(send_data=send_data(at)entry=0x55b85a247790 <XLogSendPhysical>) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/replication/walsender.c:2132
#14 0x000055b85a2482bc in StartReplication (cmd=0x55b85c600de8) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/replication/walsender.c:684
#15 exec_replication_command (cmd_string=cmd_string(at)entry=0x55b85c707658
"START_REPLICATION SLOT \"xdb41e_mail_yandex_net\" 57AC/BC000000 TIMELINE
1") at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/replication/walsender.c:1535
#16 0x000055b85a2910e6 in PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x55b85c63a8e0, dbname=0x55b85c63a820 "",
username=<optimized out>) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/tcop/postgres.c:4113
#17 0x000055b859fd44b5 in BackendRun (port=0x55b85c6380a0) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4405
#18 BackendStartup (port=0x55b85c6380a0) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4077
#19 ServerLoop () at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1755
#20 0x000055b85a22316e in PostmasterMain (argc=11, argv=<optimized out>) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1363
#21 0x000055b859fd5406 in main (argc=11, argv=0x55b85c5e21f0) at
/build/postgresql-10-hXjK9Z/postgresql-10-10.6/build/../src/backend/main/main.c:228

Any idea why it happened? If it happen in next time, which(and how)
diagnostic need to collect?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2019-02-01 14:25:28 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message PG Bug reporting form 2019-02-01 08:44:58 BUG #15616: extension functionality issues