Addressing buffer private reference count scalability issue

From: Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Addressing buffer private reference count scalability issue
Date: 2026-03-08 16:09:07
Message-ID: CAE8JnxNTETEUiAOF31=_yo=pvyAi9npOeJfcTvEJJbi4vomtYA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,

This patch addresses a performance issue pointed out by Andres Freund,

1.

Benchmark buffer pinning: You know benchmark code, implemented a few
functions that can be use in postgres queries, and a python script that
runs them and produces CSV files and SVG plots for the current build.
2.

Refactoring reference counting: Before starting to change code and
potentially breaking things I considered prudent to isolate it to limit the
damage. This code was part of a 8k+ LOC file.
3.

Using simplehash: Simply replacing the HTAB for a simplehash, and adding
a new set of macros SH_ENTRY_EMPTY, SH_MAKE_EMPTY, SH_MAKE_IN_USE. To allow
using the InvalidBuffer special value instead of allocating extra space for
a validity flag. Here I assume that the buffer buffer sequence is
independent enough from the array size, so I use the buffer as the hash key
directly, omitting a hash function call.
4.

Compact PrivateRefCountEntry: The original implementation used a 4-byte
key and 8-byte value. Reference count uses 32 bits, and it is unreasonable
to expect one backend to pin the same buffer 1 billion times. The lock mode
uses 32 bits but can only assume 4 values. So I packed them in one single
uint32, giving 30 bits for count and 2 bits for lock mode. This makes the
entries 8-byte long, on 64-bit CPUs this represents more than a 1/3
reduction in memory. This makes the array aligned with the 64-bit words,
copying one entry can be completed in one instruction, and every entry will
be aligned.
5.

REFCOUNT_ARRAY_ENTRIES=0: since the simplehash is basically some array
lookup, it is worth trying to remove it completely and keep only the hash.
For small values we would be trading a few branches for a buffer % SIZE,
for the use case of prefetch where pin/unpin in a FIFO fashion, it will
save an 8-entry array lookup, and some extra data moves.

In addition to the patch I am including

- A bash script to apply and benchmark the patches sequentially. You might
have to adjust REPO_ROOT, in my case it gets it relative to the script
path, that is under $REPO_ROOT/.patches/pins/.
- A compare-patches.py script that can be copied to
src/test/modules/test_buffer_pin to process the benchmark CSV in figures
showing one metric for different patches instead of different metrics for
one patch as the benchmark.py produces.
- A nicely formatted post about this [2]

Regards,
Alexandre

[1]
https://www.postgresql.org/message-id/s5p7iou7pdhxhvmv4rohmskwqmr36dc4rybvwlep5yvwrjs4pa%406oxsemms5mw4
[2] https://afelipe.hashnode.dev/postgres-backend-buffer-pinning-algorithm

Attachment Content-Type Size
v1-0003-Using-simplehash.patch application/octet-stream 12.8 KB
v1-0004-Compact-PrivateRefCountEntry.patch application/octet-stream 10.8 KB
v1-0002-Refactoring-reference-counting.patch application/octet-stream 50.6 KB
v1-0001-Benchmark-buffer-pinning.patch application/octet-stream 26.6 KB
v1-0005-REFCOUNT_ARRAY_ENTRIES-0.patch application/octet-stream 6.5 KB
run-all.sh text/x-sh 2.7 KB
compare-patches.py text/x-python-script 3.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-03-08 16:16:08 Re: Emitting JSON to file using COPY TO
Previous Message Andrei Lepikhov 2026-03-08 14:37:15 Re: Skipping schema changes in publication