Re: Improving connection scalability: GetSnapshotData()

From: Andres Freund <andres(at)anarazel(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jonathan Katz <jkatz(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Improving connection scalability: GetSnapshotData()
Date: 2020-04-07 12:15:03
Message-ID: 20200407121503.zltbpqmdesurflnm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

SEE BELOW: What, and what not, to do for v13.

Attached is a substantially polished version of my patches. Note that
the first three patches, as well as the last, are not intended to be
committed at this time / in this form - they're there to make testing
easier.

There is a lot of polish, but also a few substantial changes:

- To be compatible with old_snapshot_threshold I've revised the way
heap_page_prune_opt() deals with old_snapshot_threshold. Now
old_snapshot_threshold is only applied when we otherwise would have
been unable to prune (both at the time of the pd_prune_xid check, and
on individual tuples). This makes old_snapshot_threshold considerably
cheaper and cause less conflicts.

This required adding a version of HeapTupleSatisfiesVacuum that
returns the horizon, rather than doing the horizon test itself; that
way we can first test a tuple's horizon against the normal approximate
threshold (making it an accurate threshold if needed) and only if that
fails fall back to old_snapshot_threshold.

The main reason here was not to improve old_snapshot_threshold, but to
avoid a regression when its being used. Because we need a horizon to
pass to old_snapshot_threshold, we'd have to fall back to computing an
accurate horizon too often.

- Previous versions of the patch had a TODO about computing horizons not
just for one of shared / catalog / data tables, but all of them at
once. To avoid potentially needing to determine xmin horizons multiple
times within one transaction. For that I've renamed GetOldestXmin()
to ComputeTransactionHorizons() and added wrapper functions instead of
the different flag combinations we previously had for GetOldestXmin().

This allows us to get rid of the PROCARRAY_* flags, and PROC_RESERVED.

- To address Thomas' review comment about not accessing nextFullXid
without xidGenLock, I made latestCompletedXid a FullTransactionId (a
fxid is needed to be able to infer 64bit xids for the horizons -
otherwise there is some danger they could wrap).

- Improving the comment around the snapshot caching, I decided that the
justification for correctness around not taking ProcArrayLock is too
complicated (in particular around setting MyProc->xmin). While
avoiding ProcArrayLock alltogether is a substantial gain, the caching
itself helps a lot already. Seems best to leave that for a later step.

This means that the numbers for the very high connection counts aren't
quite as good.

- Plenty of small changes to address issues I found while
benchmarking. The only one of real note is that I had released
XidGenLock after ProcArrayLock in ProcArrayAdd/Remove. For 2pc that
causes noticable unnecessary contention, because we'll wait for
XidGenLock while holding ProcArrayLock...

I think this is pretty close to being committable.

But: This patch came in very late for v13, and it took me much longer to
polish it up than I had hoped (partially distraction due to various bugs
I found (in particular snapshot_too_old), partially covid19, partially
"hell if I know"). The patchset touches core parts of the system. While
both Thomas and David have done some review, they haven't for the latest
version (mea culpa).

In many other instances I would say that the above suggests slipping to
v14, given the timing.

The main reason I am considering pushing is that I think this patcheset
addresses one of the most common critiques of postgres, as well as very
common, hard to fix, real-world production issues. GetSnapshotData() has
been a major bottleneck for about as long as I have been using postgres,
and this addresses that to a significant degree.

A second reason I am considering it is that, in my opinion, the changes
are not all that complicated and not even that large. At least not for a
change to a problem that we've long tried to improve.

Obviously we all have a tendency to think our own work is important, and
that we deserve a bit more leeway than others. So take the above with a
grain of salt.

Comments?

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-TMP-work-around-missing-snapshot-registrations.patch text/x-diff 8.5 KB
v1-0002-Improve-and-extend-asserts-for-a-snapshot-being-s.patch text/x-diff 6.4 KB
v7-0001-TMP-work-around-missing-snapshot-registrations.patch text/x-diff 8.5 KB
v7-0002-Improve-and-extend-asserts-for-a-snapshot-being-s.patch text/x-diff 6.5 KB
v7-0003-Fix-xlogreader-fd-leak-encountered-with-twophase-.patch text/x-diff 1.1 KB
v7-0004-Move-delayChkpt-from-PGXACT-to-PGPROC-it-s-rarely.patch text/x-diff 9.4 KB
v7-0005-Change-the-way-backends-perform-tuple-is-invisibl.patch text/x-diff 127.2 KB
v7-0006-snapshot-scalability-Move-PGXACT-xmin-back-to-PGP.patch text/x-diff 21.9 KB
v7-0007-snapshot-scalability-Move-in-progress-xids-to-Pro.patch text/x-diff 48.5 KB
v7-0008-snapshot-scalability-Move-PGXACT-vacuumFlags-to-P.patch text/x-diff 18.1 KB
v7-0009-snapshot-scalability-Move-subxact-info-from-PGXAC.patch text/x-diff 18.8 KB
v7-0010-Remove-now-unused-PGXACT.patch text/x-diff 9.2 KB
v7-0011-snapshot-scalability-cache-snapshots-using-a-xact.patch text/x-diff 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-04-07 12:16:09 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message David Rowley 2020-04-07 12:14:16 Re: Make MemoryContextMemAllocated() more precise