Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Yura Sokolov <funny(dot)falcon(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Date: 2018-03-17 00:36:16
Message-ID: 5d4e500b-8697-e474-b507-93319566a00c@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 03/17/2018 12:03 AM, Yura Sokolov wrote:
> 16.03.2018 04:23, Tomas Vondra пишет:
>>
>> ...
>>
>> OK, a few more comments.
>>
>> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
>> my_log2 (that is, we could just call the existing function).
>
> Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
> more call be more expensive than loop itself?
>

I very much doubt it there would be a measurable difference. Firstly,
function calls are not that expensive, otherwise we'd be running around
and removing function calls from the hot paths. Secondly, the call only
happens with many XIDs, and in that case the cost should be out-weighted
by faster lookups. And finally, the function does stuff that seems far
more expensive than a single function call (for example allocation,
which may easily trigger a malloc).

In fact, in the interesting cases it's pretty much guaranteed to hit a
malloc, because the number of backend processes needs to be high enough
(say, 256 or more), which means

GetMaxSnapshotSubxidCount()

will translate to something like

((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
= (64 + 1) * 256
= 16640

and because XIDs are 4B each, that's ~65kB of memory (even ignoring the
ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB
as separate blocks, that are always malloc-ed independently.

But I may be missing something, and the extra call actually makes a
difference. But I think the onus of proving that is on you, and the
default should be not to duplicate code.

>> 2) Apparently xhlog/subxhlog fields are used for two purposes - to
>> store log2 of the two XID counts, and to remember if the hash table
>> is built. That's a bit confusing (at least it should be explained
>> in a comment) but most importantly it seems a bit unnecessary.
>
> Ok, I'll add comment.
>
>> I assume it was done to save space, but I very much doubt that
>> makes any difference. So we could easily keep a separate flag. I'm
>> pretty sure there are holes in the SnapshotData struct, so we could
>> squeeze it the flags in one of those.
>
> There's hole just right after xhlog. But it will be a bit strange to
> move fields around.
>

Is SnapshotData really that sensitive to size increase? I have my doubts
about that, TBH. The default stance should be to make the code easy to
understand. That is, we should only move fields around if it actually
makes a difference.

>> But do we even need a separate flag? We could just as easily keep
>> the log2 fields set to 0 and only set them after actually building
>> the hash table.
>
> There is a need to signal that there is space for hash. It is not
> always allocated. iirc, I didn't cover the case where snapshot were
> restored from file, and some other place or two.
> Only if all places where snapshot is allocated are properly modified
> to allocate space for hash, then flag could be omitted, and log2
> itself used as a flag.
>

Hmmm, that makes it a bit inconsistent, I guess ... why not to do the
same thing on all those places?

>> Or even better, why not to store the mask so that XidInXip does not
>> need to compute it over and over (granted, that's uint32 instead
>> of just uint8, but I don't think SnapshotData is particularly
>> sensitive to this, especially considering how much larger the xid
>> hash table is).
>
> I don't like unnecessary sizeof struct increase. And I doubt that
> computation matters. I could be mistaken though, because it is hot
> place. Do you think it will significantly improve readability?
>

IMHO the primary goal is to make the code easy to read and understand,
and only optimize struct size if it actually makes a difference. We have
no such proof here, and I very much doubt you'll be able to show any
difference because even with separate flags pahole says this:

struct SnapshotData {
SnapshotSatisfiesFunc satisfies; /* 0 8 */
TransactionId xmin; /* 8 4 */
TransactionId xmax; /* 12 4 */
TransactionId * xip; /* 16 8 */
uint32 xcnt; /* 24 4 */
uint8 xhlog; /* 28 1 */

/* XXX 3 bytes hole, try to pack */

TransactionId * subxip; /* 32 8 */
int32 subxcnt; /* 40 4 */
uint8 subxhlog; /* 44 1 */
bool suboverflowed; /* 45 1 */
bool takenDuringRecovery; /* 46 1 */
bool copied; /* 47 1 */
CommandId curcid; /* 48 4 */
uint32 speculativeToken; /* 52 4 */
uint32 active_count; /* 56 4 */
uint32 regd_count; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
pairingheap_node ph_node; /* 64 24 */
TimestampTz whenTaken; /* 88 8 */
XLogRecPtr lsn; /* 96 8 */

/* size: 104, cachelines: 2, members: 19 */
/* sum members: 101, holes: 1, sum holes: 3 */
/* last cacheline: 40 bytes */
};

That is, the whole structure is only 104B - had it got over 128B, it
might have made a difference due to extra cacheline. In fact, even if
you make the xhlog and subxhlog uint32 (which would be enough to store
the mask, which is what simplehash does), it'd be only 120B.

Based on this I'd dare to say neither of those changes would have
negative impact.

So I suggest to split each of the "combined" fields into a simple bool
flag ("hash table built") and a uint32 mask, and see if it simplifies
the code (I believe it will) and what impact it has (I believe there
will be none).

Sorry if this seems like a bikeshedding ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-17 01:29:31 Re: [PROPOSAL] Shared Ispell dictionaries
Previous Message Yura Sokolov 2018-03-16 23:03:41 Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)