Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

From: Yura Sokolov <funny(dot)falcon(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Date: 2018-03-10 02:11:27
Message-ID: 1b83e27b-5704-4666-f0ff-ccd1274f6d17@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

08.03.2018 03:42, Tomas Vondra пишет:
> On 03/06/2018 06:23 AM, Yura Sokolov wrote:
>> 05.03.2018 18:00, Tom Lane пишет:
>>> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>>> Snapshots are static (we don't really add new XIDs into existing ones,
>>>> right?), so why don't we simply sort the XIDs and then use bsearch to
>>>> lookup values? That should fix the linear search, without need for any
>>>> local hash table.
>>>
>>> +1 for looking into that, since it would avoid adding any complication
>>> to snapshot copying. In principle, with enough XIDs in the snap, an
>>> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
>>> dubious that we are often in the range where that would matter.
>>> We do need to worry about the cost of snapshot copying, too.
>>
>> Sorting - is the first thing I've tried. Unfortunately, sorting
>> itself eats too much cpu. Filling hash table is much faster.
>>
>
> I've been interested in the sort-based approach, so I've spent a bit of
> time hacking on it (patch attached). It's much less invasive compared to
> the hash-table, but Yura is right the hashtable gives better results.
>
> I've tried to measure the benefits using the same script I shared on
> Tuesday, but I kept getting really strange numbers. In fact, I've been
> unable to even reproduce the results I shared back then. And after a bit
> of head-scratching I think the script is useless - it can't possibly
> generate snapshots with many XIDs because all the update threads sleep
> for exactly the same time. And first one to sleep is first one to wake
> up and commit, so most of the other XIDs are above xmax (and thus not
> included in the snapshot). I have no idea why I got the numbers :-/
>
> But with this insight, I quickly modified the script to also consume
> XIDs by another thread (which simply calls txid_current). With that I'm
> getting snapshots with as many XIDs as there are UPDATE clients (this
> time I actually checked that using gdb).
>
> And for a 60-second run the tps results look like this (see the attached
> chart, showing the same data):
>
> writers master hash sort
> -----------------------------------------
> 16 1068 1057 1070
> 32 1005 995 1033
> 64 1064 1042 1117
> 128 1058 1021 1051
> 256 977 1004 928
> 512 773 935 808
> 768 576 815 670
> 1000 413 752 573
>
> The sort certainly does improve performance compared to master, but it's
> only about half of the hashtable improvement.
>
> I don't know how much this simple workload resembles the YCSB benchmark,
> but I seem to recall it's touching individual rows. In which case it's
> likely worse due to the pg_qsort being more expensive than building the
> hash table.
>
> So I agree with Yura the sort is not a viable alternative to the hash
> table, in this case.
>
>> Can I rely on snapshots being static? May be then there is no need
>> in separate raw representation and hash table. I may try to fill hash
>> table directly in GetSnapshotData instead of lazy filling. Though it
>> will increase time under lock, so it is debatable and should be
>> carefully measured.
>>
>
> Yes, I think you can rely on snapshots not being modified later. That's
> pretty much the definition of a snapshot.
>
> You may do that in GetSnapshotData, but you certainly can't do that in
> the for loop there. Not only we don't want to add more work there, but
> you don't know the number of XIDs in that loop. And we don't want to
> build hash tables for small number of XIDs.
>
> One reason against building the hash table in GetSnapshotData is that
> we'd build it even when the snapshot is never queried. Or when it is
> queried, but we only need to check xmin/xmax.

Thank you for analyze, Tomas.

Stephen is right about bug in snapmgr.c
Attached version fixes bug, and also simplifies XidInXip a bit.

With regards,
Sokolov Yura.

Attachment Content-Type Size
0001-Make-hash-table-for-xip-in-XidInMVCCSnapshot_v4.patch text/x-patch 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-03-10 02:58:10 Re: csv format for psql
Previous Message Tom Lane 2018-03-09 22:37:49 Bogus use of canonicalize_qual