| From: | Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Addressing buffer private reference count scalability issue |
| Date: | 2026-03-11 18:03:34 |
| Message-ID: | CAE8JnxNqf=sYB-hfeHBEtXi+aC8jezqqukdgRuR2=t8nbetL=w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Hackers,
Please find attached version 2 of this patch and the results of the
ReadBuffer/ReleaseBuffer benchmark
For those who prefer plain text here goes an excerpt
baseline patch
num_buffers pattern
1 random 194.76 183.50
sequential 183.75 168.02
2 random 204.74 187.02
sequential 195.27 169.58
3 random 222.62 192.57
sequential 212.81 169.55
7 random 233.96 219.64
sequential 221.94 171.94
11 random 365.50 247.96
sequential 318.24 183.96
19 random 390.58 267.96
sequential 369.51 195.35
100 random 424.03 286.33
sequential 420.74 280.23
988 random 445.75 301.37
sequential 453.61 313.30
10000 random 587.13 418.91
sequential 497.92 394.10
On Sun, Mar 8, 2026 at 5:09 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2026-03-08 16:09:07 +0000, Alexandre Felipe wrote:
> > 2.
> >
> > Refactoring reference counting: Before starting to change code and
> > potentially breaking things I considered prudent to isolate it to
> limit the
> > damage. This code was part of a 8k+ LOC file.
>
> Not allowing, at least the fast paths, to be inlined will make the most
> common
> cases measurably slower, I've tried that.
>
I made it inline in this time. I placed in a separate file, I am including
the
.h at the top and the .c
> > Here I assume that the buffer buffer sequence is independent enough from
> the
> > array size, so I use the buffer as the hash key directly, omitting a hash
> > function call.
>
> I doubt that that's good enough. The hash table code relies on the bits
> being
> well mixed, but you won't get that with buffer ids.
>
Introduced murmurhash32, it is noticeably worse for sequential access
(by buffer index), but has some mixing.
> I'm rather sceptical that the overhead of needing to shift and mask is
> worth
> it. I suspect we'll also want to add more state for each entry (e.g. I
> think
> it may be worth getting rid of the io-in-progress resowner).
>
io-in-progress resowner name suggests that it runs only for reads...
so maybe it could be taken out of the way of buffer hits?
> > REFCOUNT_ARRAY_ENTRIES=0: since the simplehash is basically some array
> > lookup, it is worth trying to remove it completely and keep only the
> hash.
> > For small values we would be trading a few branches for a buffer %
> SIZE,
> > for the use case of prefetch where pin/unpin in a FIFO fashion, it
> will
> > save an 8-entry array lookup, and some extra data moves.
>
> I doubt that that's ok, in the vast majority of access we will have 0-2
> buffers pinned. And even when we have pinned more buffers, it's
> *exceedingly*
> common to access the same entry repeatedly (e.g. pin, lock, unlock, unpin),
> adding a few cycles to each of those repeated accesses will quickly show
> up.
>
I brought back the array, but I eliminated the linear search.
1. USE_REFCOUNT_CACHE_ENTRY will enable the last entry cache.
2a. the dynamic array case
REFCOUNT_ARRAY_MAX_SIZE!=REFCOUNT_ARRAY_INITIAL_SIZE
will grow the array when it reaches a certain level of occupation.
I have set the default occupation level to 86% so that, if enabled, for a
random input it will grow when we have about 2*size pins in total.
If we find a sequential pattern then it will grow without growing the hash
table.
For the array lookup I don't use a hash, so for small number of pins
it will be very fast.
2b. the static case
REFCOUNT_ARRAY_MAX_SIZE==REFCOUNT_ARRAY_INITIAL_SIZE
will use a static array, just as we had before and will not perform the
linear
search. It still has to read the size and do mask input.
I tested the 4 variations and the winner is with the static array without
the cache for the last entry.
I increased the array size from 8 to 32, since you suggested before that
that this could help. At that point it would have the tradeoff of a longer
linear search, so it may help even more now.
> Incrementing/decrementing the count continue the same, just using
> > 4 instead of 1, lock mode access will require one or two additional
> > bitwise operations.
> >
> > No bit-shifts are required.
>
> I don't know how that last sentence can be true, given that:
>
> > -struct PrivateRefCountEntry
> > +#define PRIVATEREFCOUNT_LOCKMODE_MASK 0x3
> > +#define ONE_PRIVATE_REFERENCE 4 /* 1 << 2
> */
> > +
> > +#define PrivateRefCountGetLockMode(d) ((BufferLockMode)((d) &
> PRIVATEREFCOUNT_LOCKMODE_MASK))
> > +#define PrivateRefCountSetLockMode(d, m) ((d) = ((d) &
> ~PRIVATEREFCOUNT_LOCKMODE_MASK) | (m))
> > +#define PrivateRefCountGetRefCount(d) ((int32)((d) >> 2))
> > +#define PrivateRefCountIsZero(d) ((d) <
> ONE_PRIVATE_REFERENCE)
>
> Involves shifts at least in PrivateRefCountGetRefCount()..
>
Yes, there was some unintended code there.
I am adding SharedBufferHasSingleRef to replace the (...) == 1 checks.
The actual count is used only for debugging, so the shift there shouldn't
be a problem.
Regards,
Alexandre
| Attachment | Content-Type | Size |
|---|---|---|
| compare-read.png | image/png | 156.2 KB |
| v2-0004-Compact-PrivateRefCountEntry.patch | application/x-patch | 12.8 KB |
| v2-0002-Refactoring-reference-counting.patch | application/x-patch | 54.0 KB |
| v2-0003-Using-simplehash.patch | application/x-patch | 13.2 KB |
| v2-0001-Benchmark-buffer-pinning.patch | application/x-patch | 22.3 KB |
| v2-0005-Array-direct-mapping.patch | application/x-patch | 15.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Melanie Plageman | 2026-03-11 18:03:36 | Re: Unlogged rel fake lsn vs GetVictimBuffer() |
| Previous Message | Tom Lane | 2026-03-11 18:03:28 | Re: Defend against -ffast-math in meson builds |