Re: [HACKERS] Proposal for CSN based snapshots

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Ants Aasma <ants(at)cybertec(dot)at>, Bruce Momjian <bruce(at)momjian(dot)us>, obartunov <obartunov(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)postgrespro(dot)ru>, Borodin Vladimir <root(at)simply(dot)name>
Subject: Re: [HACKERS] Proposal for CSN based snapshots
Date: 2017-12-04 15:07:10
Message-ID: 8a855f33-2581-66bf-85f7-0b99239edbda@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Here is a new version of the patch with some improvements, rebased to
117469006b.

Performance on pgbench tpcb with subtransactions is now slightly better
than master. See the picture 'savepoints2'. This was achieved by
removing unnecessary exclusive locking on CSNLogControlLock in
SubTransSetParent. After that change, both versions are mostly waiting
on XidGenLock in GetNewTransactionId.

Performance on default pgbench tpcb is also improved. At scale 500, csn
is at best 30% faster than master, see the picture 'tpcb500'. These
improvements are due to slight optimizations of GetSnapshotData and
refreshing RecentGlobalXmin less often. At scale 1500, csn is slightly
faster at up to 200 clients, but then degrades steadily: see the picture
'tpcb1500'. Nevertheless, CSN-related code paths do not show up in perf
profiles or LWLock wait statistics [1]. I think what we are seeing here
is again that when some bottlenecks are removed, the fast degradation of
LWLocks under contention leads to net drop in performance. With this in
mind, I tried running the same benchmarks with patch from Yura Sokolov
[2], which should improve LWLock performance on NUMA machines. Indeed,
with this patch csn starts outperforming master on all numbers of
clients measured, as you can see in the picture 'tpcb1500'. This LWLock
change influences the csn a lot more than master, which also suggests
that we are observing a superlinear degradation of LWLocks under
increasing contention.

After this I plan to improve the comments, since many of them have
become out of date, and work on logical replication.

[1] To collect LWLock wait statistics, I sample pg_stat_activity, and
also use a bcc script by Andres Freund:
https://www.postgresql.org/message-id/flat/20170622210845(dot)d2hsbqv6rxu2tiye%40alap3(dot)anarazel(dot)de#20170622210845(dot)d2hsbqv6rxu2tiye(at)alap3(dot)anarazel(dot)de

[2]
https://www.postgresql.org/message-id/flat/2968c0be065baab8865c4c95de3f435c(at)postgrespro(dot)ru#2968c0be065baab8865c4c95de3f435c@postgrespro.ru

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
image/png 13.4 KB
image/png 13.3 KB
image/png 17.0 KB
csn-v8.patch text/x-patch 440.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-12-04 15:11:27 Re: Re: User defined data types in Logical Replication
Previous Message Robert Haas 2017-12-04 14:51:48 Re: BUG #14941: Vacuum crashes