Skip site navigation (1) Skip section navigation (2)

clog double-dip in heap_hot_search_buffer

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: clog double-dip in heap_hot_search_buffer
Date: 2012-05-02 15:10:18
Message-ID: CA+TgmobwhcHrYmiH3=8uvztZYK=TCRytiYO7rX+hYdtU4dz8vg@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
heap_hot_search_buffer() does this:

            valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);

If it turns out that the tuple isn't valid (i.e. visible to our scan)
and we haven't yet found any live tuples in the current HOT chain,
then we check whether it's visible to anyone at all:

        if (all_dead && *all_dead &&
            HeapTupleSatisfiesVacuum(heapTuple->t_data, RecentGlobalXmin,
                                     buffer) != HEAPTUPLE_DEAD)
            *all_dead = false;

This is obviously an important optimization for accelerating index
cleanup, but it has an unfortunate side-effect: it considerably
increases the frequency of CLOG access.  Normally,
HeapTupleSatisfiesVisibility() will sent hint bits on the tuple, but
sometimes it can't, either because the inserter has not yet committed
or the inserter's commit record hasn't been flushed or the deleter
hasn't committed or the deleter's commit record hasn't been flushed.
When that happens, HeapTupleSatisfiesVacuum() gets called a moment
later and repeats the same CLOG lookups.  It is of course possible for
a state change to happen in the interim, but that's not really a
reason to repeat the lookups; asking the same question twice in a row
just in case you should happen to get an answer you like better the
second time is not generally a good practice, even if it occasionally
works.

The attached patch adds a new function HeapTupleIsSurelyDead(), a
cut-down version of HeapTupleSatisfiesVacuum().  It assumes that,
first, we only care about distinguishing between dead and anything
else, and, second, that any transaction for which hint bits aren't yet
set is still running.  This allows it to be a whole lot simpler than
HeapTupleSatisfiesVacuum() and to get away without doing any CLOG
access.  It also changes heap_hot_search_buffer() to use this new
function in lieu of HeapTupleSatisfiesVacuum().

I found this problem by using 'perf record -e cs -g' and 'perf report
-g' to find out where context switches were happening.  It turns out
that this is a very significant contributor to CLOG-related context
switches.  Retesting with those same tools shows that the patch does
in fact make those context switches go away.  On a long pgbench test,
the effects of WALInsertLock contention, ProcArrayLock contention,
checkpoint-related latency, etc. will probably swamp the effect of the
patch.  On a short test, however, the effects are visible; and in
general anything that optimizes away access to heavily contended
shared memory data structures is probably a good thing.  Permanent
tables, scale factor 100, 30-second tests:

master:
tps = 22175.025992 (including connections establishing)
tps = 22072.166338 (including connections establishing)
tps = 22653.876341 (including connections establishing)

with patch:
tps = 26586.623556 (including connections establishing)
tps = 25564.098898 (including connections establishing)
tps = 25756.036647 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment: surely-dead-v1.patch
Description: application/octet-stream (3.1 KB)

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2012-05-02 15:14:27
Subject: Re: Temporary tables under hot standby
Previous:From: Kevin GrittnerDate: 2012-05-02 14:51:41
Subject: Re: proposal: additional error fields

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group