Re: limiting hint bit I/O

From: Jim Nasby <jim(at)nasby(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: limiting hint bit I/O
Date: 2011-01-18 08:47:07
Message-ID: C45117AC-5AE6-4101-B722-6CE4E159D154@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote:
> Robert Haas wrote:
>
>> a quick-and-dirty attempt to limit the amount of I/O caused by hint
>> bits. I'm still very interested in knowing what people think about
>> that.
>
> I found the elimination of the response-time spike promising. I
> don't think I've seen enough data yet to feel comfortable endorsing
> it, though. I guess the question in my head is: how much of the
> lingering performance hit was due to having to go to clog and how
> much was due to competition with the deferred writes? If much of it
> is due to repeated recalculation of visibility based on clog info, I
> think there would need to be some way to limit how many times that
> happened before the hint bits were saved.

What if we sped up the case where hint bits aren't set? Has anyone collected data on the actual pain points of checking visibility when hint bits aren't set? How about when setting hint bits is intentionally delayed? I wish we had some more infrastructure around the XIDCACHE counters; having that info available for people's general workloads might be extremely valuable. Even if I was to compile with it turned on, it seems the only way to get at it is via stderr, which is very hard to deal with.

Lacking performance data (and for my own education), I've spent the past few hours studying HeapTupleSatisfiesNow(). If I'm understanding it correctly, the three critical functions from a performance standpoint are TransactionIdIsCurrentTransactionId, TransactionIdIsInProgress and TransactionIdDidCommit. Note that all 3 can potentially be called twice; once to check xmin and once to check xmax.

ISTM TransactionIdIsCurrentTransactionId is missing a shortcut: shouldn't we be able to immediately return false if the XID we're checking is older than some value, like global xmin? Maybe it's only worth checking that case if we hit a subtransaction, but if the check is faster than one or two loops through the binary search... I would think this at least warrants a one XID cache ala cachedFetchXidStatus (though it would need to be a different cache...) Another issue is that TransactionIdIsInProgress will call this function as well, unless it skips out because the transaction is < RecentXmin.

TransactionIdIsInProgress does a fair amount of easy checking already... the biggest thing is that if it's less than RecentXmin we bounce out immediately. If we can't bounce out immediately though, this routine gets pretty expensive unless the XID is currently running and is top-level. It's worse if there are subxacts and can be horribly bad if any subxact caches have overflowed. Note that if anything has overflowed, then we end up going to clog and possibly pg_subtrans.

Finally, TransactionIdDidCommit hits clog.

So the degenerate cases seem to be:

- Really old XIDs. These suck because there's a good chance we'll have to read from clog.
- XIDs > RecontXmin that are not currently running top-level transactions. The pain here increases with subtransaction use.

For the second case, if we can ensure that RecentXmin is not very old then there's generally a smaller chance that TransactionIdIsInProgress has to do a lot of work. My experience is that most systems that have a high transaction rate don't end up with a lot of long-running transactions. Storing a list of the X oldest transactions would allow us to keep RecentXmin closer to the most recent XID.

For the first case, we should be able to create a more optimized clog lookup method that works for older XIDs. If we restrict this to XIDs that are older than GlobalXmin then we can simplify things because we don't have to worry about transactions that are in-progress. We also don't need to differentiate between subtransactions and their parents (though, we obviously need to figure out whether a subtransaction is considered to be committed or not). Because we're restricting this to XIDs that we know we can determine the state of, we only need to store a maximum of 1 bit per XID. That's already half the size of clog. But because we don't have to build this list on the fly (we're don't need to update it on every commit/abort as long as we know the range of XIDs that are stored), we don't have to support random writes. That means we can use a structure that's more complex to maintain than a simple bitmap. Or maybe we stick with a bitmap but compress it.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-01-18 08:51:44 Re: Confusing comment in TransactionIdIsInProgress
Previous Message Itagaki Takahiro 2011-01-18 08:39:37 Re: multiset patch review