Process local hint bit cache

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Process local hint bit cache
Date: 2011-03-29 21:34:16
Message-ID: AANLkTi=nJ_QyE7Ape5Ja+o3f=jNRXmNeOuWjAOFdWre2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In a previous thread
(http://postgresql.1045698.n5.nabble.com/Set-hint-bits-upon-eviction-from-BufMgr-td4264323.html)
I was playing with the idea of granting the bgwriter the ability to
due last chance hint bit setting before evicting the page out. I still
think might be a good idea, and it might be an especially good idea,
especially in scenarios where you get set the PD_ALL_VISIBLE bit when
writing out the page, a huge win in some cases. That said, it bugged
me that it didn't really fix the general case of large data insertions
and the subsequent i/o problems setting out the bits, which is my
primary objective in the short term.

So I went back to the drawing board, reviewed the archives, and came
up with a new proposal. I'd like to see a process local clog page
cache of around 1-4 pages (8-32kb typically) that would replace the
current TransactionLogFetch last xid cache. It should be small,
because I doubt more would really help much (see below) and we want to
keep this data in the tight cpu caches since it's going to be
constantly scanned.

The cache itself will be the clog pages and a small header per page
which will contain the minimum information necessary to match an xid
against a page to determine a hit, and a count of hits. Additionally
we keep a separate small array (say 100) of type xid (int) that we
insert write into in a cache miss.

So, cache lookup algorithm basically is:
scan clog cache headers and check for hit
if found (xid in range covered in clog page),
header.hit_count++;
else
miss_array[miss_count++] = xid;

A cache hit is defined about getting useful information from the page,
that is a transaction being committed or invalid.

When the miss count array fills, we sort it and determine the most
commonly hit clog page that is not in the cache and use that
information to replace pages in the cache if necessary, then reset the
counts. Maybe we can add a minimum threshold of hits, say 5-10% if
miss_array size for a page to be deemed interesting enough to be
loaded into the cache.

Interaction w/set hint bits:
*) If a clog lookup faults through the cache, we basically keep the
current behavior. That is, the hint bits are set and the page is
marked BM_DIRTY and the hint bits get written back

*) If we get a clog cache hit, that is the hint bits are not set but
we pulled the transaction status from the cache, the hint bits are
recorded on the page *but the page is not written back*, at least on
hint bit basis alone. This behavior branch is more or less the
BM_UNTIDY as suggested by haas (see archives), except it's only seen
in 'cache hit' scenarios. We are not writing pages back because the
cache is suggesting there is little/not benefit to write them back.

Thus, if a single backend is scanning a lot of pages with transactions
touching a very small number of clog pages, hint bits are generally
not written back because they are not needed and in fact not helping.
However, if the xid are spread around a large number of clog pages, we
get the current behavior more or less (plus the overhead of cache
maintenance).

With the current code base, hint bits are very beneficial when the xid
entropy is high and the number of repeated scan is high, and not so
good when the xid entropy is low and the number of repeated scans is
low. The process local cache attempts to redress this without
disadvantaging the already good cases. Furthermore, if it can be
proven that the cache overhead is epsilon, it's pretty unlikely to
negatively impact anyone negatively, at lest, that's my hope. Traffic
to clog will reduce (although not much, since i'd wager the current
'last xid' cache works pretty well), but i/o should be reduced, in
some cases quite significantly for a tiny cpu cost (although that
remains to be proven).

merlin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-03-29 21:50:43 pg_dump --binary-upgrade vs. ALTER TYPE ... DROP ATTRIBUTE
Previous Message Peter Eisentraut 2011-03-29 20:48:49 gcc 4.6 warnings -Wunused-but-set-variable