Quick Links

Re: Shared hash table allocations

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>
Subject:	Re: Shared hash table allocations
Date:	2026-04-02 12:55:23
Message-ID:	CAExHW5uWdU1iEM_eVFVVmaHqfjLpq0QrdFUeZjtBDYpNwfuRBg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 1, 2026 at 2:55 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 31/03/2026 01:02, Heikki Linnakangas wrote:
> > I wonder if we should change the defaults somehow. In usual
> > configurations, people are currently getting much more lock space than
> > you'd expect based on max_connections and max_locks_per_transaction, and
> > after these patches, they'll get much fewer locks. It might be prudent
> > bump up the default max_locks_per_transaction setting so that you'd get
> > roughly the same amount of locks in the default configuration.
>
> I did some testing of the memory usage and how removing the wiggle room
> affects the number of locks you can acquire. Attached are the test
> procedures I used, and proposed patches. The patches are new, designed
> to just change the parameters of the hash tables and shmem calculations
> with no other changes. They don't include the refactorings we've
> discussed so far in this thread. My plan is to commit these new patches
> first, and those other refactorings after that. Once these new patches
> are committed, the refactorings won't materially change the overall
> memory usage or how it's divided between different hash tables, all
> those effects are in these new patches.
>
>
> master: With the default configuration on master, the attached test
> procedure can take 14927 locks before hitting "out of shared memory"
> error. At that point, all the "wiggle room" is assigned for the LOCK
> hash table. A different scenario could make the PROCLOCK hash table
> consume all the wiggle room instead, but I believe running out of LOCK
> space is more common, and I don't think it changes the big picture
> anyway if you hit the ceiling with PROCLOCK instead.
>
> 0001: While looking at this, I noticed that we add 10% "safety margin"
> to the shmem calculations in predicate.c, but we had already marked the
> predicate.c hash tables as HASH_FIXED_SIZE so they were never able to
> make use of the safety margin. Oops. The extra memory was available for
> the lock.c hash tables, though. After removing that bogus 10% safety
> margin from predicate.c, memory usage was reduced by 200 kB, but the
> number of locks you could take went down from 14927 to 14159.
>
> 0002: As the next step, I also removed the 10% safety margin from
> lock.c. That reduced memory usage by another 320 kB, and the number of
> locks went down from 14159 to 12815.
>
> 0003: After those changes, there's only little extra memory sloshing
> around that's not accounted for any data structure. ipci.c reserves a
> constant 100 kB, but that's pretty much it. However, there's still
> flexibility between the LOCK and the PROCLOCK hash tables. The PROCLOCK
> hash table is estimated to be 2x the size of the LOCK table, but when
> it's not, the space can get assigned to the LOCK table instead. In patch
> 0003 I removed that flexibility by marking them both with
> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops
> the hash tables from using any of the other remaining wiggle room,
> making them truly fixed-size. This doesn't change the overall shared
> memory allocated, but the number of locks that the test procedure could
> acquire went down from 12815 to 8767, mostly because it cannot "steal"
> space from PROCLOCK anymore.
>
> 0004: To buy back that lock manager space in common out-of-the box
> situations, I propose to bump up the default for
> max_locks_per_transactions from 64 to 128. That increases memory usage
> again by 3216 kB, making it 2696 kB higher than on master (remember that
> the previous changes reduced memory usage). The number of locks you can
> take after that is 17535, which more than on master (14927).
>
> Increasing the default won't affect users who have already set
> max_locks_per_transaction to a non-default value. They will see that the
> number of locks they can acquire with their existing configuration will
> be reduced, again because of the lost wiggle room and flexibility
> between LOCK and PROCLOCK. Not sure if we could or should do something
> about that. Probably best to just document in the release notes that if
> you had raised increase max_locks_per_transaction, you might need to
> raise it further to be able to accommodate the same amount of locks as
> before.
>
> Here's all that in table form:
>
> | Patch | Shmem (kB) | Locks |
> | --------------------------------------+------------|-------|
> | master | 153560 | 14927 |
> | 0001: remove 10% from predicate.c | 153360 | 14159 |
> | 0002: remove 10% from lock.c | 153040 | 12815 |
> | 0003: Make lock.c tables fixed size | 153040 | 8767 |
> | 0004: max_locks_per_transactions=128 | 156256 | 17535 |
>
> This increase in memory usage is not great, but it's not that big in the
> grand scheme of things. I think it's well worth, and better than the
> sloppy scheme we have today.
>
> Any thoughts, objections?

When we "allocate" shared memory, we are just allocating space on
systems which use mmap. The memory gets allocated only when it is
touched. The wiggle room as a whole is never touched during
initialization. Those pages get allocated when wiggle room is used -
i.e. when the entries beyond initial number are allocated. By
allocating maximal hash tables, I was worried that we will allocate
more memory than required. But that's not true since a 4K memory page
fits only 50-60 entries - far less than the default configuration
permits. Most of the memory for the hash table will be allocated as
the entries as used.

The second hazard of increasing hash table size is the hash table
access becomes slower as it becomes sparse [1]. I don't think it shows
up in performance but maybe worth trying a trivial pgbench run, just
to make sure that default performance doesn't regress.

The increase in memory usage is 3MB, which is fine usually. I mean, we
didn't hear any complaints when we increased the default size of the
shared buffer pool - this is much less than that. But why do you want
to double the max_locks_per_transaction? I first thought it's because
the hash table size is anyway a power of 2. But then the size of the
hash table is actually max_locks_per_transaction * (number of backends
+ number of prepared transactions). What we want is the default
max_locks_per_transaction such that 14927 locks are allowed. Playing
with max_locks_per_transaction using your script 109 seems to be the
number which will give us 14951 locks. It looks (and is) an odd
number. If we are worried about memory increase, that's the number we
should use as default and then write a long paragraph about why we
chose such an odd-looking number :D.

I think we should highlight the change in default in the release notes
though. The users which use default configuration will notice an
increase in the memory. If they are using a custom value, they will
think of bumping it up. Can we give them some ballpark % by which they
should increase their max_locks_per_transaction? E.g. double the
number or something?

I looked at the places where max_locks_per_transaction is used. I
don't see any place that needs code updates other than the ones in the
patch.

LGTM, overall.

[1] https://ashutoshpg.blogspot.com/2025/07/efficiency-of-sparse-hash-table.html

--
Best Wishes,
Ashutosh Bapat

In response to

Re: Shared hash table allocations at 2026-03-31 21:25:44 from Heikki Linnakangas

Responses

Re: Shared hash table allocations at 2026-04-02 14:14:46 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2026-04-02 12:55:31	Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message	Thomas Munro	2026-04-02 12:50:30	Re: Change default of jit to off