Quick Links

optimizing repeated MVCC snapshots

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	optimizing repeated MVCC snapshots
Date:	2012-01-05 13:54:46
Message-ID:	CA+Tgmoa-rAm3=LJ4iF+45A7-NWkLoJL8wsuYQXG=-dtzxK1y2A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 2:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Another thought is that it should always be safe to reuse an old
>> snapshot if no transactions have committed or aborted since it was
>> taken
>
> Yeah, that might work better. And it'd be a win for all MVCC snaps,
> not just the ones coming from promoted SnapshotNow ...

Here's a rough patch for that. Some benchmark results from a 32-core
Itanium server are included below. They look pretty good. Out of the
dozen configurations I tested, all but one came out ahead with the
patch. The loser was 80-clients on permanent tables, but I'm not too
worried about that since 80 clients on unlogged tables came out ahead.

This is not quite in a committable state yet, since it presumes atomic
64-bit reads and writes, and those aren't actually atomic everywhere.
What I'm thinking we can do is: on platforms where 8-byte reads and
writes are known to be atomic, we do as the patch does currently. On
any platform where we don't know that to be the case, we can move the
test I added to GetSnapshotData inside the lock, which should still be
a small win at low concurrencies. At high concurrencies it's a bit
iffy, because making GetSnapshotData's critical section shorter might
lead to lots of people trying to manipulate the ProcArrayLock spinlock
in very rapid succession. Even if that turns out to be an issue, I'm
inclined to believe that anyone who has enough concurrency for that to
matter probably also has atomic 8-byte reads and writes, and so the
most that will be needed is an update to our notion of which platforms
have that capability. If that turns out to be wrong, the other
obvious alternative is to not to the test at all unless it can be done
unlocked.

To support the above, I'm inclined to add a new file
src/include/atomic.h which optionally defines a macro called
ATOMIC_64BIT_OPS and macros atomic_read_uint64(r) and
atomic_write_uint64(l, r). That way we can eventually support (a)
architectures where 64-bit operations aren't atomic at all, (b)
architectures where ordinary 64-bit operations are atomic
(atomic_read_unit64(r) -> r, and atomic_write_uint64(l, r) -> l = r),
and (c) architectures (like 32-bit x86) where ordinary 64-bit
operations aren't atomic but special instructions (cmpxchg8b) can be
used to get that behavior.

m = master, s = with patch. scale factor 100, median of three
5-minute test runs. shared_buffers=8GB, checkpoint_segments=300,
checkpoint_timeout=30min, effective_cache_size=340GB,
wal_buffers=16MB, wal_writer_delay=20ms, listen_addresses='*',
synchronous_commit=off. binary modified with chatr +pd L +pi L and
run with rtsched -s SCHED_NOAGE -p 178.

Permanent Tables
================

m01 tps = 912.865209 (including connections establishing)
s01 tps = 916.848536 (including connections establishing)
m08 tps = 6256.429549 (including connections establishing)
s08 tps = 6364.214425 (including connections establishing)
m16 tps = 10795.373683 (including connections establishing)
s16 tps = 11038.233359 (including connections establishing)
m24 tps = 13710.400042 (including connections establishing)
s24 tps = 13836.823580 (including connections establishing)
m32 tps = 14574.758456 (including connections establishing)
s32 tps = 15125.196227 (including connections establishing)
m80 tps = 12014.498814 (including connections establishing)
s80 tps = 11825.302643 (including connections establishing)

Unlogged Tables
===============

m01 tps = 942.950926 (including connections establishing)
s01 tps = 953.618574 (including connections establishing)
m08 tps = 6492.238255 (including connections establishing)
s08 tps = 6537.197731 (including connections establishing)
m16 tps = 11363.708861 (including connections establishing)
s16 tps = 11561.193527 (including connections establishing)
m24 tps = 14656.659546 (including connections establishing)
s24 tps = 14977.226426 (including connections establishing)
m32 tps = 16310.814143 (including connections establishing)
s32 tps = 16644.921538 (including connections establishing)
m80 tps = 13422.438927 (including connections establishing)
s80 tps = 13780.256723 (including connections establishing)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
optimize-repeated-snapshots.patch	application/octet-stream	4.0 KB

Responses

Re: optimizing repeated MVCC snapshots at 2012-01-05 14:01:43 from Florian Weimer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Florian Weimer	2012-01-05 14:01:43	Re: optimizing repeated MVCC snapshots
Previous Message	Merlin Moncure	2012-01-05 13:25:12	Re: Page Checksums + Double Writes