Re: Process local hint bit cache

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Process local hint bit cache
Date: 2011-03-30 14:05:51
Message-ID: AANLkTinWMup4SejxE94PyNwjQa_x=eZuAv39bpxSg8RY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 29, 2011 at 4:34 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> In a previous thread
> (http://postgresql.1045698.n5.nabble.com/Set-hint-bits-upon-eviction-from-BufMgr-td4264323.html)
> I was playing with the idea of granting the bgwriter the ability to
> due last chance hint bit setting before evicting the page out. I still
> think might be a good idea, and it might be an especially good idea,
> especially in scenarios where you get set the PD_ALL_VISIBLE bit when
> writing out the page, a huge win in some cases.  That said, it bugged
> me that it didn't really fix the general case of large data insertions
> and the subsequent i/o problems setting out the bits, which is my
> primary objective in the short term.
>
> So I went back to the drawing board, reviewed the archives, and came
> up with a new proposal.  I'd like to see a process local clog page
> cache of around 1-4 pages (8-32kb typically) that would replace the
> current TransactionLogFetch last xid cache. It should be small,
> because I doubt more would really help much (see below) and we want to
> keep this data in the tight cpu caches since it's going to be
> constantly scanned.
>
> The cache itself will be the clog pages and a small header per page
> which will contain the minimum information necessary to match an xid
> against a page to determine a hit, and a count of hits.  Additionally
> we keep a separate small array (say 100) of type xid (int) that we
> insert write into in a cache miss.
>
> So, cache lookup algorithm basically is:
> scan clog cache headers and check for hit
> if found (xid in range covered in clog page),
>  header.hit_count++;
> else
>  miss_array[miss_count++] = xid;
>
> A cache hit is defined about getting useful information from the page,
> that is a transaction being committed or invalid.
>
> When the miss count array fills,  we sort it and determine the most
> commonly hit clog page that is not in the cache and use that
> information to replace pages in the cache if necessary, then reset the
> counts. Maybe we can add a minimum threshold of hits, say 5-10% if
> miss_array size for a page to be deemed interesting enough to be
> loaded into the cache.
>
> Interaction w/set hint bits:
> *) If a clog lookup faults through the cache, we basically keep the
> current behavior.  That is, the hint bits are set and the page is
> marked BM_DIRTY and the hint bits get written back
>
> *) If we get a clog cache hit, that is the hint bits are not set but
> we pulled the transaction status from the cache, the hint bits are
> recorded on the page *but the page is not written back*, at least on
> hint bit basis alone.  This behavior branch is more or less the
> BM_UNTIDY as suggested by haas (see archives), except it's only seen
> in 'cache hit' scenarios.  We are not writing pages back because the
> cache is suggesting there is little/not benefit to write them back.
>
> Thus, if a single backend is scanning a lot of pages with transactions
> touching a very small number of clog pages, hint bits are generally
> not written back because they are not needed and in fact not helping.
> However, if the xid are spread around a large number of clog pages, we
> get the current behavior more or less (plus the overhead of cache
> maintenance).
>
> With the current code base, hint bits are very beneficial when the xid
> entropy is high and the number of repeated scan is high, and not so
> good when the xid entropy is low and the number of repeated scans is
> low.  The process local cache attempts to redress this without
> disadvantaging the already good cases.  Furthermore, if it can be
> proven that the cache overhead is epsilon, it's pretty unlikely to
> negatively impact anyone negatively, at lest, that's my hope.  Traffic
> to clog will reduce (although not much, since i'd wager the current
> 'last xid' cache works pretty well), but i/o should be reduced, in
> some cases quite significantly for a tiny cpu cost (although that
> remains to be proven).

A couple of details I missed:
*) clog lookups that return no cacheable information will not have
their xid inserted into the 'missed' array -- this will prevent a clog
page returning 'in progress' type states for transactions from pushing
out pages that are returning useful information. In other words, an
in progress transaction is neither a hit or miss from the point of
view of the cache -- it's nothing.

*) If we fault to the clog and get useful information
(commit/invalid), and the clog page is already cached -- either the
particular bit is set or the entire cache page is refreshed (not sure
which is the better way to go yet)

*) The clog cache might be useful in other places like during page
eviction hint bet setting scenarios I mentioned earlier. In non
bgwriter scenarios it's almost certainly a win to at least check the
cache and set hint bits for BM_HEAP pages since you are leveraging the
work already paid during scans. In the bgwriter case, you would have
to build the cache by checking clog pages. I'm somewhat skeptical if
this would actually help the bgwriter though since it involves
tradeoffs that are hard to estimate.

*) Maybe the shared buffer cache currently being maintained over the
clog can be scrapped. I'm going to leave it alone for now, but I'm
quite skeptical it provides much benefit even without local process
cache. clog page have a very nice property that you don't have to
worry about what else is going on from other processes and thus no
complicated locking or invalidation issues when considering cache
structure. IMNSHO -- this makes a local cache a much better fit even
if you have to keep it smaller for memory usage reasons.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2011-03-30 14:11:20 Re: Triggers on system catalog
Previous Message David Fetter 2011-03-30 13:59:43 Re: Triggers on system catalog