Replication slot drop message is sent after pgstats shutdown.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Replication slot drop message is sent after pgstats shutdown.
Date: 2021-08-31 02:37:08
Message-ID: CAD21AoBgSTF8gp1SKojKRu9dqzN4p1Ob6Mh=QgVhGfLO1NtUYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I found another pass where we report stats after the stats collector
shutdown. The reproducer and the backtrace I got are here:

1. psql -c "begin; create table a (a int); select pg_sleep(30); commit;" &
2. pg_recvlogical --create-slot -S slot -d postgres &
3. stop the server

TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown",
File: "pgstat.c", Line: 4752, PID: 62789)
0 postgres 0x000000010a8ed79a
ExceptionalCondition + 234
1 postgres 0x000000010a5e03d2
pgstat_assert_is_up + 66
2 postgres 0x000000010a5e1dc4 pgstat_send + 20
3 postgres 0x000000010a5e1d5c
pgstat_report_replslot_drop + 108
4 postgres 0x000000010a64c796
ReplicationSlotDropPtr + 838
5 postgres 0x000000010a64c0e9
ReplicationSlotDropAcquired + 89
6 postgres 0x000000010a64bf23
ReplicationSlotRelease + 99
7 postgres 0x000000010a6d60ab ProcKill + 219
8 postgres 0x000000010a6a350c shmem_exit + 444
9 postgres 0x000000010a6a326a
proc_exit_prepare + 122
10 postgres 0x000000010a6a3163 proc_exit + 19
11 postgres 0x000000010a8ee665 errfinish + 1109
12 postgres 0x000000010a6e3535
ProcessInterrupts + 1445
13 postgres 0x000000010a65f654
WalSndWaitForWal + 164
14 postgres 0x000000010a65edb2
logical_read_xlog_page + 146
15 postgres 0x000000010a22c336
ReadPageInternal + 518
16 postgres 0x000000010a22b860 XLogReadRecord + 320
17 postgres 0x000000010a619c67
DecodingContextFindStartpoint + 231
18 postgres 0x000000010a65c105
CreateReplicationSlot + 1237
19 postgres 0x000000010a65b64c
exec_replication_command + 1180
20 postgres 0x000000010a6e6d2b PostgresMain + 2459
21 postgres 0x000000010a5ef1a9 BackendRun + 89
22 postgres 0x000000010a5ee6fd BackendStartup + 557
23 postgres 0x000000010a5ed487 ServerLoop + 759
24 postgres 0x000000010a5eac22 PostmasterMain + 6610
25 postgres 0x000000010a4c32d3 main + 819
26 libdyld.dylib 0x00007fff73477cc9 start + 1

At step #2, wal sender waits for another transaction started at step
#1 to complete after creating the replication slot. When the server is
stopping, wal sender process drops the slot on releasing the slot
since it's still RS_EPHEMERAL. Then, after dropping the slot we report
the message for dropping the slot (see ReplicationSlotDropPtr()).
These are executed in ReplicationSlotRelease() called by ProcKill()
which is called during calling on_shmem_exit callbacks, which is after
shutting down pgstats during before_shmem_exit callbacks. I’ve not
tested yet but I think this can potentially happen also when dropping
a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
clean up temporary slots.

There are some ideas to fix this issue but I don’t think it’s a good
idea to move either ProcKill() or the slot releasing code to
before_shmem_exit in this case, like we did for other similar
issues[1][2]. Reporting the slot dropping message on dropping the slot
isn’t necessarily essential actually since autovacuums periodically
check already-dropped slots and report to drop the stats. So another
idea would be to move pgstat_report_replslot_drop() to a higher layer
such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
are not called during callbacks. The replication slot stats are
dropped when it’s dropped via commands such as
pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
hand, for temporary slots and ephemeral slots, we rely on autovacuums
to drop their stats. Even if we delay to drop the stats for those
slots, pg_stat_replication_slots don’t show the stats for
already-dropped slots.

Any other ideas?

Regards,

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=675c945394b36c2db0e8c8c9f6209c131ce3f0a8
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dcac5e7ac157964f71f15d81c7429130c69c3f9b

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-08-31 02:38:39 Re: archive status ".ready" files may be created too early
Previous Message houzj.fnst@fujitsu.com 2021-08-31 02:10:53 RE: Added missing invalidations for all tables publication