Re: Slow standby snapshot

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Slow standby snapshot
Date: 2021-11-14 12:09:43
Message-ID: CANtu0ohzBFTYwdLtcanWo4+794WWUi7LY2rnbHyorJdE8_ZnGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, everyone.

I made a performance test to make sure the patch solves real issues
without performance regression.
Tests are made on 3 VM - one for primary, another - standby, latest
one - pgbench. It is Azure Standard_D16ads_v5 - 16 VCPU, 64GIB RAM,
Fast SSD.
5000 used as a number of connections (it is the max number of
connections for AWS - LEAST({DBInstanceClassMemory/9531392}, 5000)).

Setup:
primary:
max_connections=5000
listen_addresses='*'
fsync=off
standby:
primary_conninfo = 'user=postgres host=10.0.0.4 port=5432
sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres
target_session_attrs=any'
hot_standby_feedback = on
max_connections=5000
listen_addresses='*'
fsync=off

The test was run the following way:

# restart both standby and primary
# init fresh DB
./pgbench -h 10.0.0.4 -i -s 10 -U postgres -d postgres

# warm up primary for 10 seconds
./pgbench -h 10.0.0.4 -b simple-update -j 8 -c 16 -P 1 -T 10 -U
postgres postgres

# warm up standby for 10 seconds
./pgbench -h 10.0.0.5 -b select-only -j 8 -c 16 -n -P 1 -T 10 -U
postgres postgres

# then, run at the same(!) time (in parallel):

# simple-update on primary
./pgbench -h 10.0.0.4 -b simple-update -j 8 -c 16 -P 1 -T 180 -U
postgres postgres

# simple-select on standby
./pgbench -h 10.0.0.5 -b select-only -j 8 -c 16 -n -P 1 -T 180 -U
postgres postgres

# then, after 60 seconds after test start - start a long transaction
on the master
./psql -h 10.0.0.4 -c "BEGIN; select txid_current();SELECT
pg_sleep(5);COMMIT;" -U postgres postgres

I made 3 runs for both the patched and vanilla versions (current
master branch). One run of the patched version was retried because of
a significant difference in TPS (it is vCPU on VM with neighborhoods,
so, probably some isolation issue).
The result on the primary is about 23k-25k TPS for both versions.

So, graphics show a significant reduction of TPS on the secondary
while the long transaction is active (about 10%).
The patched version solves the issue without any noticeable regression
in the case of short-only transactions.
Also, transactions could be much shorted to reduce CPU - a few seconds
is enough.

Also, this is `perf diff` between `with` and `without` long
transaction recording.

Vanilla (+ 10.26% of KnownAssignedXidsGetAndSetXmin):
0.22% +10.26% postgres [.]
KnownAssignedXidsGetAndSetXmin
3.39% +0.68% [kernel.kallsyms] [k]
_raw_spin_unlock_irqrestore
2.66% -0.61% libc-2.31.so [.] 0x0000000000045dc1
3.77% -0.50% postgres [.] base_yyparse
3.43% -0.45% [kernel.kallsyms] [k] finish_task_switch
0.41% +0.36% postgres [.] pg_checksum_page
0.61% +0.31% [kernel.kallsyms] [k] copy_user_generic_string

Patched (+ 0.22%):
2.26% -0.40% [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.78% +0.39% [kernel.kallsyms] [k] copy_user_generic_string
0.22% +0.26% postgres [.] KnownAssignedXidsGetAndSetXmin
0.23% +0.20% postgres [.] ScanKeywordLookup
3.77% +0.19% postgres [.] base_yyparse
0.64% +0.19% postgres [.] pg_checksum_page
3.63% -0.18% [kernel.kallsyms] [k] finish_task_switch

If someone knows any additional performance tests that need to be done
- please share.

Best regards,
Michail.

Attachment Content-Type Size
image/png 20.7 KB
image/png 65.1 KB
image/png 18.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-11-14 12:17:41 Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display
Previous Message Amit Langote 2021-11-14 11:45:15 Re: support for MERGE