Slow standby snapshot

From: Кирилл Решке <reshkekirill(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Slow standby snapshot
Date: 2021-05-20 08:52:47
Message-ID: CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
I recently ran into a problem in one of our production postgresql cluster.
I had noticed lock contention on procarray lock on standby, which causes
WAL replay lag growth.
To reproduce this, you can do the following:

1) set max_connections to big number, like 100000
2) begin a transaction on primary
3) start pgbench workload on primary and on standby

After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in
perf top consuming abount 75 % of CPU.

%%
PerfTop: 1060 irqs/sec kernel: 0.0% exact: 0.0% [4000Hz cycles:u],
(target_pid: 273361)
-------------------------------------------------------------------------------

73.92% postgres [.] KnownAssignedXidsGetAndSetXmin
1.40% postgres [.] base_yyparse
0.96% postgres [.] LWLockAttemptLock
0.84% postgres [.] hash_search_with_hash_value
0.84% postgres [.] AtEOXact_GUC
0.72% postgres [.] ResetAllOptions
0.70% postgres [.] AllocSetAlloc
0.60% postgres [.] _bt_compare
0.55% postgres [.] core_yylex
0.42% libc-2.27.so [.] __strlen_avx2
0.23% postgres [.] LWLockRelease
0.19% postgres [.] MemoryContextAllocZeroAligned
0.18% postgres [.] expression_tree_walker.part.3
0.18% libc-2.27.so [.] __memmove_avx_unaligned_erms
0.17% postgres [.] PostgresMain
0.17% postgres [.] palloc
0.17% libc-2.27.so [.] _int_malloc
0.17% postgres [.] set_config_option
0.17% postgres [.] ScanKeywordLookup
0.16% postgres [.] _bt_checkpage

%%

We have tried to fix this by using BitMapSet instead of boolean array
KnownAssignedXidsValid, but this does not help too much.

Instead, using a doubly linked list helps a little more, we got +1000 tps
on pgbench workload with patched postgresql. The general idea of this patch
is that, instead of memorizing which elements in KnownAssignedXids are
valid, lets maintain a doubly linked list of them. This solution will work
in exactly the same way, except that taking a snapshot on the replica is
now O(running transaction) instead of O(head - tail) which is significantly
faster under some workloads. The patch helps to reduce CPU usage of
KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate
it from perf top.

The problem is better reproduced on PG13 since PG14 has some snapshot
optimization.

Thanks!

Best regards, reshke

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-05-20 08:58:59 RE: Forget close an open relation in ReorderBufferProcessTXN()
Previous Message Amit Langote 2021-05-20 08:49:20 Re: Skip partition tuple routing with constant partition key