| From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> | 
|---|---|
| To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> | 
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> | 
| Subject: | Re: Slow standby snapshot | 
| Date: | 2021-11-14 12:09:43 | 
| Message-ID: | CANtu0ohzBFTYwdLtcanWo4+794WWUi7LY2rnbHyorJdE8_ZnGg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hello, everyone.
I made a performance test to make sure the patch solves real issues
without performance regression.
Tests are made on 3 VM - one for primary, another - standby, latest
one - pgbench. It is Azure Standard_D16ads_v5 - 16 VCPU, 64GIB RAM,
Fast SSD.
5000 used as a number of connections (it is the max number of
connections for AWS - LEAST({DBInstanceClassMemory/9531392}, 5000)).
Setup:
    primary:
        max_connections=5000
        listen_addresses='*'
        fsync=off
    standby:
        primary_conninfo = 'user=postgres host=10.0.0.4 port=5432
sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres
target_session_attrs=any'
        hot_standby_feedback = on
        max_connections=5000
        listen_addresses='*'
        fsync=off
The test was run the following way:
# restart both standby and primary
# init fresh DB
./pgbench -h 10.0.0.4 -i -s 10 -U postgres -d postgres
# warm up primary for 10 seconds
./pgbench -h 10.0.0.4 -b simple-update -j 8 -c 16 -P 1 -T 10 -U
postgres postgres
# warm up standby for 10 seconds
./pgbench -h 10.0.0.5 -b select-only -j 8 -c 16 -n -P 1 -T 10 -U
postgres postgres
# then, run at the same(!) time (in parallel):
# simple-update on primary
./pgbench -h 10.0.0.4 -b simple-update -j 8 -c 16 -P 1 -T 180 -U
postgres postgres
# simple-select on standby
./pgbench -h 10.0.0.5 -b select-only -j 8 -c 16 -n -P 1 -T 180 -U
postgres postgres
# then, after 60 seconds after test start - start a long transaction
on the master
./psql -h 10.0.0.4 -c "BEGIN; select txid_current();SELECT
pg_sleep(5);COMMIT;" -U postgres postgres
I made 3 runs for both the patched and vanilla versions (current
master branch). One run of the patched version was retried because of
a significant difference in TPS (it is vCPU on VM with neighborhoods,
so, probably some isolation issue).
The result on the primary is about 23k-25k TPS for both versions.
So, graphics show a significant reduction of TPS on the secondary
while the long transaction is active (about 10%).
The patched version solves the issue without any noticeable regression
in the case of short-only transactions.
Also, transactions could be much shorted to reduce CPU - a few seconds
is enough.
Also, this is `perf diff` between `with` and `without` long
transaction recording.
Vanilla (+ 10.26% of KnownAssignedXidsGetAndSetXmin):
     0.22%    +10.26%  postgres            [.]
KnownAssignedXidsGetAndSetXmin
     3.39%     +0.68%  [kernel.kallsyms]   [k]
_raw_spin_unlock_irqrestore
     2.66%     -0.61%  libc-2.31.so        [.] 0x0000000000045dc1
     3.77%     -0.50%  postgres            [.] base_yyparse
     3.43%     -0.45%  [kernel.kallsyms]   [k] finish_task_switch
     0.41%     +0.36%  postgres            [.] pg_checksum_page
     0.61%     +0.31%  [kernel.kallsyms]   [k] copy_user_generic_string
Patched (+ 0.22%):
     2.26%     -0.40%  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
     0.78%     +0.39%  [kernel.kallsyms]   [k] copy_user_generic_string
     0.22%     +0.26%  postgres            [.] KnownAssignedXidsGetAndSetXmin
     0.23%     +0.20%  postgres            [.] ScanKeywordLookup
     3.77%     +0.19%  postgres            [.] base_yyparse
     0.64%     +0.19%  postgres            [.] pg_checksum_page
     3.63%     -0.18%  [kernel.kallsyms]   [k] finish_task_switch
If someone knows any additional performance tests that need to be done
- please share.
Best regards,
Michail.
| Attachment | Content-Type | Size | 
|---|---|---|
|   | image/png | 20.7 KB | 
|   | image/png | 65.1 KB | 
|   | image/png | 18.7 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2021-11-14 12:17:41 | Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display | 
| Previous Message | Amit Langote | 2021-11-14 11:45:15 | Re: support for MERGE |