| From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com> |
| Subject: | Re: Shared hash table allocations |
| Date: | 2026-04-02 10:24:29 |
| Message-ID: | CAEze2WhYsCNRd3E9qGSZbXd5k0UVa7xgMZ1V6tARRKezPPEFUw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, 31 Mar 2026 at 23:25, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 31/03/2026 01:02, Heikki Linnakangas wrote:
> > I wonder if we should change the defaults somehow. In usual
> > configurations, people are currently getting much more lock space than
> > you'd expect based on max_connections and max_locks_per_transaction, and
> > after these patches, they'll get much fewer locks. It might be prudent
> > bump up the default max_locks_per_transaction setting so that you'd get
> > roughly the same amount of locks in the default configuration.
>
> master: With the default configuration on master, the attached test
> procedure can take 14927 locks before hitting "out of shared memory"
> error. At that point, all the "wiggle room" is assigned for the LOCK
> hash table. A different scenario could make the PROCLOCK hash table
> consume all the wiggle room instead, but I believe running out of LOCK
> space is more common, and I don't think it changes the big picture
> anyway if you hit the ceiling with PROCLOCK instead.
>
> 0001: [...]
LGTM
> 0002: As the next step, I also removed the 10% safety margin from
> lock.c. That reduced memory usage by another 320 kB, and the number of
> locks went down from 14159 to 12815.
LGTM
> 0003: In patch 0003 I removed that flexibility by marking them both with
> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops
> the hash tables from using any of the other remaining wiggle room,
> making them truly fixed-size.
I think this patch finally gave me a good reason why PROCLOCK would've
needed to be allocated with double the sizes of LOCK:
LOCK is (was) initialized with only 50% of its max capacity. If
PROCLOCK was initialized with the same parameters and all spare shmem
is then allocated to other processes, then backends wouldn't be able
to safely use max_locks_per_transaction. To guarantee no OOMs when all
backends use max_locks_per_transaction, PROCLOCK's size must be
doubled to make sure PROCLOCK has sufficient space. (The same isn't
usually an issue for LOCK, because it's very likely backends will
operate on the same tables, and thus will be able to share most of the
LOCK structs.)
Now that LOCK is fully allocated, I think the size doubling can be
removed, or possibly parameterized for those that need it.
> 0004: To buy back that lock manager space in common out-of-the box
> situations, I propose to bump up the default for
> max_locks_per_transactions from 64 to 128. [...]
> The number of locks you can
> take after that is 17535, which more than on master (14927).
Note that this is for one backend; with current sizing you could lock
the same 17535 locks in at least one more backend.
Patch LGTM.
> Any thoughts, objections?
Overall, I'm +1 on this change. I do have some general comments
though, at least in part based on discussions in the hackers discord
last year[0]:
1.) We'll need to clearly advertise the changed, more strict behaviour
of the heavy-weight locking system in the release notes.
2.) (Related) We probably should make it easier for DBAs to monitor
lock counts now that we enforce the limit more strictly. This could
take the form of (optional) logging that alerts when a session exceeds
some threshold number of locks in a transaction (e.g. 100% and 200% of
max_locks_per_transaction), or as a metric in
pg_stat_{activity,databases} as total locks taken/max number of locks
taken in a transaction.
3.) (Related) We should probably parameterize the LOCK-to-PROCLOCK
ratio. LOCK is large, and especially on systems with high values of
max_connections (where the additional LOCKs will go unused) the
overhead of carrying all those additional LOCKs would go up to 50% of
the added memory usage (LOCK at 152+24=176B, PROCLOCK at
2*(64+24B)=176B). It'd be nice if we could avoid allocating that
memory.
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
[0] starting at
https://discord.com/channels/1258108670710124574/1266090488415654032/1442879718285119518
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-04-02 10:52:13 | Re: some more include removal from headers |
| Previous Message | Ashutosh Sharma | 2026-04-02 10:24:26 | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |