Re: Process local hint bit cache

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Process local hint bit cache
Date: 2011-04-07 18:28:16
Message-ID: BANLkTimt2=yoUE7NK3Zy=NM+n0dvq2iB7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 4, 2011 at 9:25 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Sun, Apr 3, 2011 at 6:40 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Sat, Apr 2, 2011 at 8:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>>>> On Thu, Mar 31, 2011 at 5:38 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>>>> working on exanding the cache to # xid > 1.
>>>
>>>> patch attached.  this is essentially my original idea except it's
>>>> injected directly in to tqual.c as a kind of a expansion of the 'last
>>>> seen' concept.  Because the cache loops are inlined and very tight (4
>>>> int compares), it's actually cheaper than jumping out of line into
>>>> clog to test this_int = that_int;
>>>
>>> This seems like a mess.  Why is the criterion for caching commit states
>>> different from whether the hint bit can be set?  And why are you
>>
>> The hint bits are always set.  The page is flagged dirty if and only
>> if you have to dip below the process local cache to get it.  The
>> rationale for this is simple: it caps your downside because unlike
>> hint bit removal or BM_UNTIDY, you only have to dip into clog and pay
>> the i/o once per page -- so that your downside is limited to pre-cache
>> behavior (minus the overhead of maintaining the cache).
>
> I might have missed part of the thrust of your question.  In the
> patch, the commit status is only stored if the LSN for the xid is
> known safe (instead of checking it just before setting the bit as
> SetHintBits does).  This is indeed different, and it was done so as to
> avoid having to a. repeatedly check XLogNeedsFlush or b. cache the xid
> LSN as transam.c does.  Perhaps this is weak sauce, so I think the
> first thing to do is have another go at doing it at the transam level.
>  Stay tuned...

Ok, after having done some more testing, I don't think this is gonna
work without at least some changes to tqual.c. It's exactly the stuff
that was bugging you, or some variant thereof, that needs to happen
with penalizing some cases. Letting transam.c handle 100% of the
cache management has the following issues:

*) No way to signal tqual.c to not dirty page if we got a cache hit --
the i/o mitigation strategy rests on being able to do this.

*) transam.c cache implementation should also cache transaction LSN
position, which is expensive memory wise for less benefit

*) Lots of non-inline calls. Jumping out of HeapTupleSatisifiesMVCC
into non-inline function is expensive, at least on my test machine.
Since there are more cases now where you fall through thei hint bit
check, you are elevating significantly the amount of extra calls you
make out of line which eats up all the savings you get -- sometimes
more. In particular:

*) both TransactionIdIsCurrentTransactionId and
TransactionIdIsInProgress are called *before* TransactionIdDidCommit.
so, even if you are going to get a cache 'hit', you still have to pay
for jumping out to these functions and running them.
TransactionIdIsInProgress could be made inline possibly if you go put
it under transam.c cache umbrella, it still has material cost in that
case.

*) Setting commit cache if and only if your LSN is known safe is
important innovation and must be preserved -- if you are getting lots
of 'commit cache hit' tuples, it saves a lot of time not having to
constantly recheck the LSN flush status of the same transactions over
and over.

Here's the times I'm seeing on my workstation (all timings are
somewhat noisy, accurate within 10ms or so).
test:
create table v as select generate_series(1,500000) v;
select count(*) from v;

stock:
run 1: 3000 ms (setting hint bits)
run 2+ 665 ms (hint bit optimized)

transam.c cache non inline
run 1+ 1000ms

transam.c cache inline (made inline stub that proxies to existing
function on cache miss)
run 1+ 800ms

tqual.c cache, checked at TransactionIdDidCommit
run 1+ 790ms (1-2 less 'if' statements)

tqual.c cache, checked at TransactionIdDidCommit, no LSN flush check
run 1+ 740ms

tqual.c cache, checked right after hint bit, no LSN flush check
run 1+ 670ms

The last case is near hint bit performance, because the check is
happening more or less at the same place, and the check itself is a
small number of int math operations (and can be optimized further
still). I haven't yet timed the degenerate case where the commit
cache is receiving tons of misses and is constantly flushing out
pages. I'm not expecting this to be bad, but need to mention this --
I have some ideas about mitigation if it's a problem.

So the question is, what is the way forwards? Follows is the commit
cache code I'm hooking into various places, with some fixes from
previous patch. The actual implementation of the cache structure I'm
not stuck on, other than it has to be completely inline (no dynahash)
and very tight. Where it gets called from is much more important.

merlin

#define COMMIT_CACHE_BUCKET_COUNT 4
#define COMMIT_CACHE_BUCKET_BLOCKSZ BLCKSZ
#define COMMIT_CACHE_ROLLUPSZ 100
#define COMMIT_CACHE_HIT_THRESHOLD 5

typedef struct
{
int XidBucket;
int HitCount;
bool Clear;
} CommitCacheBucketHeader;

/* I know static definition here is lousy, but it's convenient for
testing purposes */
static CommitCacheBucketHeader CommitCacheBucketList[COMMIT_CACHE_BUCKET_COUNT]
= {{-1, 0, false}, {-1, 0, false}, {-1, 0, false}, {-1, 0, false}};
static char CommitCacheData[COMMIT_CACHE_BUCKET_COUNT][COMMIT_CACHE_BUCKET_BLOCKSZ];
static int CommitCacheBucketRollupList[COMMIT_CACHE_ROLLUPSZ];
static int CommitCacheMissCount = 0;

/* qsort comparison function */
static int
int_cmp(const void *p1, const void *p2)
{
int v1 = *((const int *) p1);
int v2 = *((const int *) p2);

if (v1 < v2)
return -1;
if (v1 > v2)
return 1;
return 0;
}

static void RollUpCommitCache()
{
TransactionId LastBucket = CommitCacheBucketRollupList[0];
int BucketIdx;
int HitCount = 0;
int Bucket;
int BucketMinHits = COMMIT_CACHE_ROLLUPSZ + 1;
int BucketMinIdx;
int XidBucket;

qsort(CommitCacheBucketRollupList, COMMIT_CACHE_ROLLUPSZ,
sizeof(int), int_cmp);

for (BucketIdx = 0; BucketIdx < COMMIT_CACHE_ROLLUPSZ; BucketIdx++)
{
if(CommitCacheBucketRollupList[BucketIdx] == LastBucket)
HitCount++;
else
{
if (HitCount >= COMMIT_CACHE_HIT_THRESHOLD)
{
CheckLastXidBlock:
for (Bucket = 0; Bucket <
COMMIT_CACHE_BUCKET_COUNT; Bucket++)
{
if
(CommitCacheBucketList[Bucket].HitCount < BucketMinHits)
{
BucketMinIdx = Bucket;
BucketMinHits =
CommitCacheBucketList[Bucket].HitCount;
}
}
XidBucket =
CommitCacheBucketRollupList[BucketIdx];

if (HitCount > BucketMinHits)
{
if(XidBucket !=
CommitCacheBucketList[BucketMinIdx].XidBucket)
{

CommitCacheBucketList[BucketMinIdx].XidBucket = XidBucket;

CommitCacheBucketList[BucketMinIdx].HitCount = HitCount;

CommitCacheBucketList[BucketMinIdx].Clear = true;
}
}

}
HitCount = 1;
LastBucket = CommitCacheBucketRollupList[BucketIdx];
}
}
if(HitCount >= COMMIT_CACHE_HIT_THRESHOLD)

{
BucketIdx--;
goto CheckLastXidBlock;
}

for (Bucket = 0; Bucket < COMMIT_CACHE_BUCKET_COUNT; Bucket++)
{
if(CommitCacheBucketList[BucketMinIdx].Clear)
memset(CommitCacheData[Bucket], 0,
COMMIT_CACHE_BUCKET_BLOCKSZ);

CommitCacheBucketList[BucketMinIdx].Clear = 0;
CommitCacheBucketList[BucketMinIdx].HitCount = 0;
}

CommitCacheMissCount = 0;
return;
}

static inline bool
IsXidInCommitCacheI(TransactionId xid)
{
int XidBucket = xid / COMMIT_CACHE_BUCKET_BLOCKSZ;
int Bucket;

for(Bucket = 0; Bucket < COMMIT_CACHE_BUCKET_COUNT; Bucket++)
{
if(CommitCacheBucketList[Bucket].XidBucket == XidBucket)
{
int ByteOffset = xid / BITS_PER_BYTE;
int BitOffset = xid % BITS_PER_BYTE;
if (CommitCacheData[Bucket][ByteOffset] & (1
<< BitOffset))
{
CommitCacheBucketList[Bucket].HitCount++;
return true;
}

break;

}
}

CommitCacheBucketRollupList[CommitCacheMissCount++] = XidBucket;

if (CommitCacheMissCount == COMMIT_CACHE_ROLLUPSZ)
{
RollUpCommitCache();
}

return false;
}

static inline void
SetXidInCommitCacheI(TransactionId xid)
{
int XidBucket = xid / COMMIT_CACHE_BUCKET_BLOCKSZ;
int Bucket;

for(Bucket = 0; Bucket < COMMIT_CACHE_BUCKET_COUNT; Bucket++)
{
if(XidBucket == CommitCacheBucketList[Bucket].XidBucket)
{
int ByteOffset = xid / BITS_PER_BYTE;
int BitOffset = xid % BITS_PER_BYTE;
CommitCacheData[Bucket][ByteOffset] |= (1 << BitOffset);
break;
}
}

return;
}

static inline bool
TransactionIdDidCommitCacheI(TransactionId transactionId)
{
return IsXidInCommitCacheI(transactionId);
}

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-04-07 18:49:46 Re: Process local hint bit cache
Previous Message Brendan Jurd 2011-04-07 17:46:55 Re: Failed assert ((data - start) == data_size) in heaptuple.c