Re: MultiXact\SLRU buffers configuration

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2021-03-29 08:26:02
Message-ID: 018EF641-FC9D-4BA9-A458-6842E27E91C2@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 29 марта 2021 г., в 02:15, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> On Sat, Mar 27, 2021 at 6:31 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>> 27 марта 2021 г., в 01:26, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>>> , and murmurhash which is inlineable and
>>> branch-free.
>
>> I think pageno is a hash already. Why hash any further? And pages accessed together will have smaller access time due to colocation.
>
> Yeah, if clog_buffers is large enough then it's already a "perfect
> hash", but if it's not then you might get some weird "harmonic"
> effects (not sure if that's the right word), basically higher or lower
> collision rate depending on coincidences in the data. If you apply a
> hash, the collisions should be evenly spread out so at least it'll be
> somewhat consistent. Does that make sense?
As far as I understand "Harmonic" effects only make sense if the distribution is unknown. Hash protects from "periodic" data when periods are equal to hash table size. I don't think we need to protect from this case, SLRU data is expected to be localised...
Cost of this protection is necessity to calculate murmur hash on each SLRU lookup. Probably, 10-100ns. Seems like not a big deal.

> (At some point I figured out that the syscaches have lower collision
> rates and perform better if you use oids directly instead of hashing
> them... but then it's easy to create a pathological pattern of DDL
> that turns your hash table into a linked list. Not sure what to think
> about that.)
>
>>> I had to tweak it to support "in-place" creation and
>>> fixed size (in other words, no allocators, for use in shared memory).
>
>> We really need to have a test to know what happens when this structure goes out of memory, as you mentioned below. What would be apropriate place for simplehash tests?
>
> Good questions. This has to be based on being guaranteed to have
> enough space for all of the entries, so the question is really just
> "how bad can performance get with different load factors". FWIW there
> were some interesting cases with clustering when simplehash was first
> used in the executor (see commits ab9f2c42 and parent) which required
> some work on hashing quality to fix.
Interesting read, I didn't know much about simple hash, but seems like there is still many cases where it can be used for good. I always wondered why Postgres uses only Larson's linear hash.

>
>>> Then I was annoyed that I had to add a "status" member to our struct,
>>> so I tried to fix that.
>
>> Indeed, sizeof(SlruMappingTableEntry) == 9 seems strange. Will simplehash align it well?
>
> With that "intrusive status" patch, the size is back to 8. But I
> think I made a mistake: I made it steal some key space to indicate
> presence, but I think the presence test should really get access to
> the whole entry so that you can encode it in more ways. For example,
> with slotno == -1.
>
> Alright, considering the date, if we want to get this into PostgreSQL
> 14 it's time to make some decisions.
>
> 1. Do we want customisable SLRU sizes in PG14?
>
> +1 from me, we have multiple reports of performance gains from
> increasing various different SLRUs, and it's easy to find workloads
> that go faster.
Yes, this is main point of this discussion. So +1 from me too.

>
> One thought: it'd be nice if the user could *see* the current size,
> when using the default. SHOW clog_buffers -> 0 isn't very helpful if
> you want to increase it, but don't know what it's currently set to.
> Not sure off the top of my head how best to do that.
Don't we expect that SHOW command indicate exactly same value as in config or SET command? If this convention does not exist - probably showing effective value is a good idea.

> 2. What names do we want the GUCs to have? Here's what we have:
>
> Proposed GUC Directory System views
> clog_buffers pg_xact Xact
> multixact_offsets_buffers pg_multixact/offsets MultiXactOffset
> multixact_members_buffers pg_multixact/members MultiXactMember
> notify_buffers pg_notify Notify
> serial_buffers pg_serial Serial
> subtrans_buffers pg_subtrans Subtrans
> commit_ts_buffers pg_commit_ts CommitTs
>
> By system views, I mean pg_stat_slru, pg_shmem_allocations and
> pg_stat_activity (lock names add "SLRU" on the end).
>
> Observations:
>
> It seems obvious that "clog_buffers" should be renamed to "xact_buffers".
+1
> It's not clear whether the multixact GUCs should have the extra "s"
> like the directories, or not, like the system views.
I think we show break the ties by native English speaker's ears or typing habits. I'm not a native speaker.

> It see that we have "Shared Buffer Lookup Table" in
> pg_shmem_allocations, so where I generated names like "Subtrans
> Mapping Table" I should change that to "Lookup" to match.
>
> 3. What recommendations should we make about how to set it?
>
> I think the answer depends partially on the next questions! I think
> we should probably at least say something short about the pg_stat_slru
> view (cache miss rate) and pg_stat_actitity view (waits on locks), and
> how to tell if you might need to increase it. I think this probably
> needs a new paragraph, separate from the docs for the individual GUC.
I can only suggest incident-driven approach.
1. Observe ridiculous amount of backends waiting on particular SLRU.
2. Double SLRU buffers for that SLRU.
3. Goto 1.
I don't think we should mention this approach in docs.

> 4. Do we want to ship the dynahash patch?

This patch allows to throw infinite amount of memory on a problem of SLRU waiting for IO. So the scale of improvement is much higher. Do I want that we ship this patch? Definitely. Does this change much? I don't know.

>
> +0.9. The slight hesitation is that it's new code written very late
> in the cycle, so it may still have bugs or unintended consequences,
> and as you said, at small sizes the linear search must be faster than
> the hash computation. Could you help test it, and try to break it?
I'll test it and try to break.

> Can we quantify the scaling effect for some interesting workloads, to
> see at what size the dynahash beats the linear search, so that we can
> make an informed decision?
I think we cannot statistically distinguish linear search from hash search by means of SLRU. But we can create some synthetic benchmarks.

> Of course, without a hash table, large
> sizes will surely work badly, so it'd be tempting to restrict the size
> you can set the GUC to.
>
> If we do include the dynahash patch, then I think it would also be
> reasonable to change the formula for the default, to make it higher on
> large systems. The restriction to 128 buffers (= 1MB) doesn't make
> much sense on a high frequency OLTP system with 128GB of shared
> buffers or even 4GB. I think "unleashing better defaults" would
> actually be bigger news than the GUC for typical users, because
> they'll just see PG14 use a few extra MB and go faster without having
> to learn about these obscure new settings.
I agree. I don't see why we would need to limit buffers to 128 in presence of hash search.

> 5. Do we want to ship the simplehash patch?
>
> -0.5. It's a bit too exciting for the last minute, so I'd be inclined
> to wait until the next cycle to do some more research and testing. I
> know it's a better idea in the long run.
OK, obviously, it's safer decision.

My TODO list:
1. Try to break patch set v13-[0001-0004]
2. Think how to measure performance of linear search versus hash search in SLRU buffer mapping.

Thanks!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-03-29 08:34:35 Re: invalid data in file backup_label problem on windows
Previous Message Erik Nordström 2021-03-29 08:18:20 Re: Feedback on table expansion hook (including patch)