Re: Process local hint bit cache

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Process local hint bit cache
Date: 2011-06-29 14:09:06
Message-ID: BANLkTikgTc1An=WVo_JpJb-qKXrx+0Tf9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 2, 2011 at 8:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:

>> aside:
>> Moving TransactionIdInProgress below TransactionIdDidCommit can help
>> in once sense: TransactionIdDidCommit grabs the XidStatus but discards
>> the knowledge if the transaction is known aborted.
>
> Doesn't the single-entry cache help with that?

Yes, it does.

Merlin,

I'm a little worried by some of the statements around this patch. It
would be good to get a single clear analysis, then build the patch
from that.

* TransactionIdDidCommit grabs XidStatus and then keeps it - if the
xact is known aborted or known committed.
* In the top of the patch you mention that "It's tempting to try and
expand the simple last transaction status in
transam.c but this check happens too late in the critical visibility
routines to provide much benefit. For the fastest possible cache
performance with the least amount of impact on scan heavy cases, you have
to maintain the cache here if you want to avoid dirtying the page
based on interaction with the cache."

We call TransactionIdIsKnownCompleted() before we touch shared memory
in TransactionIdIsInProgress(), which seems early enough for me.
Maybe I've misunderstood your comments.

Also, the reason we set the hint bits is (in my understanding) so that
other users will avoid having to do clog lookups. If we only cache xid
status locally, we would actually generate more clog lookups, not
less. I realise that reduces I/O, so it seems to be a straight
tradeoff between I/O and clog access.

I think we need to be clear what the goals of this are - different
people may have different tuning goals - perhaps we need a parameter
for that because in different situations either one might make sense.

For me, it makes sense for us to set hint bits on already-dirty pages
when they are written out by the bgwriter. At the moment we do very
little to amortise the I/O that must already happen.

You could easily get frustrated here and lose interest. I hope not.
This is interesting stuff and you are well qualified to find a
solution. I just want to be sure we define exactly what the problem
is, and publish all of the analysis first. Writing the final patch is
probably the easiest part of that task.

Anyway, I don't think this patch is ready for commit this time around,
but good hunting. There is some treasure to be found here, I'm
certain.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-06-29 14:30:51 Re: [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
Previous Message Yeb Havinga 2011-06-29 13:44:18 Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)