Make HeapTupleSatisfiesMVCC more concurrent

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Make HeapTupleSatisfiesMVCC more concurrent
Date: 2015-08-18 22:55:56
Message-ID: CAMkU=1xVyQ0BC2ChEBAk+PGGJEwfrK0Qe9KWi6NJwBVOvW=C_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When we check a tuple for MVCC, it has to pass checks that the inserting
transaction has committed, and that it committed before our snapshot
began. And similarly that the deleting transaction hasn't committed, or
did so after our snapshot.

XidInMVCCSnapshot is (or can be) very much cheaper
than TransactionIdIsInProgress, because the former touches only local
memory while the latter takes a highly contended lock and inspects shared
memory. We do the slow one first, but we could do the fast one first and
sometimes short-circuit the slow one. If the transaction is in our
snapshot, it doesn't matter if it is still in progress or not.

This was discussed back in 2013 (
http://www.postgresql.org/message-id/CAMkU=1yy-YEQVvqj2xJitT1EFkyuFk7uTV_hrOMGyGMxpU=N+Q@mail.gmail.com),
and I wanted to revive it. The recent lwlock atomic changes haven't made
the problem irrelevant.

This patch swaps the order of the checks under some conditions. So that
hackers can readily do testing without juggling binaries, I've added an
experimental guc which controls the behavior. JJ_SNAP=0 is the original
(git HEAD) behavior, JJ_SNAP=1 turns on the new behavior.

I've added some flag variables to record if XidInMVCCSnapshot was already
called. XidInMVCCSnapshot is cheap, but not so cheap that we want to call
it twice if we can avoid it. Those would probably stay in some form or
another when the experimental guc goes away.

We might be able to rearrange the series of "if-tests" to get rid of the
flags variables, but I didn't want to touch the HEAP_MOVED_OFF
and HEAP_MOVED_IN parts of the code, as those must get about zero
regression testing.

The situation where the performance of this really shows up is when there
are tuples that remain in an unresolved state while highly concurrent
processes keep stumbling over them.

I set that up by using the pgbench tables with scale factor of 1, and
running a custom query at high concurrency which seq_scans the accounts
table:

pgbench -f <(echo 'select sum(abalance) from pgbench_accounts') -T 30 \
-n -c32 -j32 --startup='set JJ_SNAP=1'

While the test is contrived, it reproduces complaints I've seen on several
forums.

To create the burden of unresolved tuples, I open psql and run:
begin; update pgbench_accounts set abalance =1-abalance;

...and leave it uncommitted for a while.

Representative numbers for test runs of the above custom query on a 8-CPU
machine:

tps = 542 regardless of JJ_SNAP, when no in-progress tuples
tps = 30 JJ_SNAP=0 with uncommitted bulk update
tps = 364 JJ_SNAP=1 with uncommitted bulk update

A side effect of making this change would be that a query which finds a
tuple inserted or deleted by a transaction still in the query's snapshot
never checks to see if that transaction committed, and so it doesn't set
the hint bit if it did commit or abort. Some future query with a newer
snapshot will have to do that. It is at least theoretically possible that
this could mean that many hint bits could fail to get set while the buffer
is still dirty in shared_buffers, which means it needs to get dirtied again
once set. I doubt this would be significant, but if anyone has a test case
which they think could show up a problem in this area, please try it out or
describe it.

There are other places in tqual.c which could probably use similar
re-ordering tricks, but this is the one for which I have a reproducible
test case,

Cheers

Jeff

Attachment Content-Type Size
SatisfiesMVCC_reorder_v001.patch application/octet-stream 3.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2015-08-18 23:40:06 Re: Our trial to TPC-DS but optimizer made unreasonable plan
Previous Message Tom Lane 2015-08-18 22:40:11 Re: Error message with plpgsql CONTINUE