Re: Proposal for CSN based snapshots

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Ants Aasma <ants(at)cybertec(dot)at>, Bruce Momjian <bruce(at)momjian(dot)us>, obartunov <obartunov(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)postgrespro(dot)ru>, Borodin Vladimir <root(at)simply(dot)name>
Subject: Re: Proposal for CSN based snapshots
Date: 2017-08-01 14:11:56
Message-ID: add4d640-742a-2d15-1dd2-a8ed3b5a0943@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

So I did some more experiments on this patch.

* I fixed the bug with duplicate tuples I mentioned in the previous
letter. Indeed, the oldestActiveXid could be advanced past the
transaction's xid before it set the clog status. This happened because
the oldestActiveXid is calculated based on the CSN log contents, and we
wrote to CSN log before writing to clog. The fix is to write to clog
before CSN log (TransactionIdAsyncCommitTree)

* We can remove the exclusive locking on CSNLogControlLock when setting
the CSN for a transaction (CSNLogSetPageStatus). When we assign a CSN to
a transaction and its children, the atomicity is guaranteed by using an
intermediate state (COMMITSEQNO_COMMITTING), so it doesn't matter if
this function is not atomic in itself. The shared lock should suffice here.

* On throughputs of about 100k TPS, we allocate ~1k CSN log pages per
second. This is done with exclusive locking on CSN control lock, and
noticeably increases contention. To alleviate this, I allocate new pages
in batches (ExtendCSNLOG).

* When advancing oldestActiveXid, we scan CSN log to find an xid that is
still in progress. To do that, we increment the xid and query its CSN
using the high level function, acquiring and releasing the lock and
looking up the log page for each xid. I wrote some code to acquire the
lock only once and then scan the pages (CSNLogSetPageStatus).

* On bigger buffers the linear page lookup code that the SLRU uses now
becomes slow. I added a shared dynahash table to speed up this lookup.

* I merged in recent changes from master (up to 7e1fb4). Unfortunately I
didn't have enough time to fix the logical replication and snapshot
import, so now it's completely broken.

I ran some pgbench with these tweaks (tpcb-like, 72 cores, scale 500).
The throughput is good on lower number of clients (on 50 clients it's
35% higher than on the master), but then it degrades steadily. After 250
clients it's already lower than master; see the attached graph. In perf
reports the CSN-related things have almost vanished, and I see lots of
time spent working with clog. This is probably the situation where by
making some parts faster, the contention in other parts becomes worse
and overall we have a performance loss. Hilariously, at some point I saw
a big performance increase after adding some debug printfs. I wanted to
try some things with the clog next, but for now I'm out of time.

The new version of the patch is attached. Last time I apparently diff'ed
it the other way around, now it should apply fine.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
image/png 12.9 KB
csn-v6.patch text/x-diff 421.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-01 14:35:55 Re: How to run PG TAP tests on windows?
Previous Message Ildus Kurbangaliev 2017-08-01 14:08:46 Remove 1MB size limit in tsvector