Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

From: Yura Sokolov <funny(dot)falcon(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Date: 2018-04-01 17:53:06
Message-ID: a7b61e78-8c92-6f06-5fbb-09f51f017421@gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

17.03.2018 03:36, Tomas Vondra пишет:
>
> On 03/17/2018 12:03 AM, Yura Sokolov wrote:
>> 16.03.2018 04:23, Tomas Vondra пишет:
>>>
>>> ...
>>>
>>> OK, a few more comments.
>>>
>>> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
>>> my_log2 (that is, we could just call the existing function).
>>
>> Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
>> more call be more expensive than loop itself?
>>
>
> I very much doubt it there would be a measurable difference. Firstly,
> function calls are not that expensive, otherwise we'd be running around
> and removing function calls from the hot paths. Secondly, the call only
> happens with many XIDs, and in that case the cost should be out-weighted
> by faster lookups. And finally, the function does stuff that seems far
> more expensive than a single function call (for example allocation,
> which may easily trigger a malloc).
>
> In fact, in the interesting cases it's pretty much guaranteed to hit a
> malloc, because the number of backend processes needs to be high enough
> (say, 256 or more), which means
>
> GetMaxSnapshotSubxidCount()
>
> will translate to something like
>
> ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
> = (64 + 1) * 256
> = 16640
>
> and because XIDs are 4B each, that's ~65kB of memory (even ignoring the
> ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB
> as separate blocks, that are always malloc-ed independently.
>
> But I may be missing something, and the extra call actually makes a
> difference. But I think the onus of proving that is on you, and the
> default should be not to duplicate code.
>
>>> 2) Apparently xhlog/subxhlog fields are used for two purposes - to
>>> store log2 of the two XID counts, and to remember if the hash table
>>> is built. That's a bit confusing (at least it should be explained
>>> in a comment) but most importantly it seems a bit unnecessary.
>>
>> Ok, I'll add comment.
>>
>>> I assume it was done to save space, but I very much doubt that
>>> makes any difference. So we could easily keep a separate flag. I'm
>>> pretty sure there are holes in the SnapshotData struct, so we could
>>> squeeze it the flags in one of those.
>>
>> There's hole just right after xhlog. But it will be a bit strange to
>> move fields around.
>>
>
> Is SnapshotData really that sensitive to size increase? I have my doubts
> about that, TBH. The default stance should be to make the code easy to
> understand. That is, we should only move fields around if it actually
> makes a difference.
>
>>> But do we even need a separate flag? We could just as easily keep
>>> the log2 fields set to 0 and only set them after actually building
>>> the hash table.
>>
>> There is a need to signal that there is space for hash. It is not
>> always allocated. iirc, I didn't cover the case where snapshot were
>> restored from file, and some other place or two.
>> Only if all places where snapshot is allocated are properly modified
>> to allocate space for hash, then flag could be omitted, and log2
>> itself used as a flag.
>>
>
> Hmmm, that makes it a bit inconsistent, I guess ... why not to do the
> same thing on all those places?
>
>>> Or even better, why not to store the mask so that XidInXip does not
>>> need to compute it over and over (granted, that's uint32 instead
>>> of just uint8, but I don't think SnapshotData is particularly
>>> sensitive to this, especially considering how much larger the xid
>>> hash table is).
>>
>> I don't like unnecessary sizeof struct increase. And I doubt that
>> computation matters. I could be mistaken though, because it is hot
>> place. Do you think it will significantly improve readability?
>>
>
> IMHO the primary goal is to make the code easy to read and understand,
> and only optimize struct size if it actually makes a difference. We have
> no such proof here, and I very much doubt you'll be able to show any
> difference because even with separate flags pahole says this:
>
> struct SnapshotData {
> SnapshotSatisfiesFunc satisfies; /* 0 8 */
> TransactionId xmin; /* 8 4 */
> TransactionId xmax; /* 12 4 */
> TransactionId * xip; /* 16 8 */
> uint32 xcnt; /* 24 4 */
> uint8 xhlog; /* 28 1 */
>
> /* XXX 3 bytes hole, try to pack */
>
> TransactionId * subxip; /* 32 8 */
> int32 subxcnt; /* 40 4 */
> uint8 subxhlog; /* 44 1 */
> bool suboverflowed; /* 45 1 */
> bool takenDuringRecovery; /* 46 1 */
> bool copied; /* 47 1 */
> CommandId curcid; /* 48 4 */
> uint32 speculativeToken; /* 52 4 */
> uint32 active_count; /* 56 4 */
> uint32 regd_count; /* 60 4 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> pairingheap_node ph_node; /* 64 24 */
> TimestampTz whenTaken; /* 88 8 */
> XLogRecPtr lsn; /* 96 8 */
>
> /* size: 104, cachelines: 2, members: 19 */
> /* sum members: 101, holes: 1, sum holes: 3 */
> /* last cacheline: 40 bytes */
> };
>
> That is, the whole structure is only 104B - had it got over 128B, it
> might have made a difference due to extra cacheline. In fact, even if
> you make the xhlog and subxhlog uint32 (which would be enough to store
> the mask, which is what simplehash does), it'd be only 120B.
>
> Based on this I'd dare to say neither of those changes would have
> negative impact.
>
> So I suggest to split each of the "combined" fields into a simple bool
> flag ("hash table built") and a uint32 mask, and see if it simplifies
> the code (I believe it will) and what impact it has (I believe there
> will be none).
>
> Sorry if this seems like a bikeshedding ...
>
>
> regards
>

I've made version "without bit magic" as you suggested (in attach).
I can't reliably say is there any performance difference with previous
version.

With regards,
Yura.

Attachment Content-Type Size
0001-Make-hash-table-for-xip-in-XidInMVCCSnapshot_v5.patch text/x-patch 11.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-01 17:55:05 Re: Rethinking -L switch handling and construction of LDFLAGS
Previous Message Andres Freund 2018-04-01 17:43:38 Re: Rethinking -L switch handling and construction of LDFLAGS