Quick Links

Re: Shared hash table allocations

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>
Subject:	Re: Shared hash table allocations
Date:	2026-04-02 14:14:46
Message-ID:	83e37829-0d94-49b2-ad48-5feb7b5d5e44@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 02/04/2026 15:55, Ashutosh Bapat wrote:
> When we "allocate" shared memory, we are just allocating space on
> systems which use mmap. The memory gets allocated only when it is
> touched. The wiggle room as a whole is never touched during
> initialization. Those pages get allocated when wiggle room is used -
> i.e. when the entries beyond initial number are allocated. By
> allocating maximal hash tables, I was worried that we will allocate
> more memory than required. But that's not true since a 4K memory page
> fits only 50-60 entries - far less than the default configuration
> permits. Most of the memory for the hash table will be allocated as
> the entries as used.

Hmm, that's a good point about untouched memory not being allocated. I
think it's fine, though.

With small changes on top of the the earlier refactorings from this
thread, we could stop pre-allocating all the elements when a shared
memory hash table is created, and have ShmemHashAlloc() allocate them on
the fly, but instead of doing them as anonymous allocations like we do
with ShmemAlloc() today, the allocations could come from the
pre-allocated region dedicated to the hash table. You'd still get the
same determinism and visibility in pg_shmem_allocations, but you could
avoid actually touching the pages until they're needed. Not sure it's
worth the trouble.

> The second hazard of increasing hash table size is the hash table
> access becomes slower as it becomes sparse [1]. I don't think it shows
> up in performance but maybe worth trying a trivial pgbench run, just
> to make sure that default performance doesn't regress.

Interesting, but yeah I don't think that's going to be measurable. I did
some quick testing with a test function that just locks and unlocks
relations:

PG_FUNCTION_INFO_V1(test_lock_bench);
Datum
test_lock_bench(PG_FUNCTION_ARGS)
{
int32 num_distinct_locks = PG_GETARG_INT32(0);
int32 num_acquires = PG_GETARG_INT32(1);

LOCKMODE lockmode = AccessExclusiveLock;

#define FIRST_RELID 1000000000

for (int32 i = 0; i < num_acquires; i++)
{
Oid relid = FIRST_RELID + i % num_distinct_locks;

if (i >= num_distinct_locks)
UnlockRelationOid(relid, lockmode);

if (!ConditionalLockRelationOid(relid, lockmode))
{
elog(LOG, "could not acquire lock, iteration %d", i);
break;
}
}

PG_RETURN_VOID();
}

With test_lock_bench(1, 5000000), I don't see any meaningful difference,
i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10
to max_locks_per_transactions=128.

With more distinct locks involved, the caching effects might be bigger,
and maybe you'd see a difference because of more or less collisions.
Spot testing some values on my laptop, I don't see anything that would
worry me though.

> The increase in memory usage is 3MB, which is fine usually. I mean, we
> didn't hear any complaints when we increased the default size of the
> shared buffer pool - this is much less than that. But why do you want
> to double the max_locks_per_transaction? I first thought it's because
> the hash table size is anyway a power of 2. But then the size of the
> hash table is actually max_locks_per_transaction * (number of backends
> + number of prepared transactions). What we want is the default
> max_locks_per_transaction such that 14927 locks are allowed. Playing
> with max_locks_per_transaction using your script 109 seems to be the
> number which will give us 14951 locks. It looks (and is) an odd
> number. If we are worried about memory increase, that's the number we
> should use as default and then write a long paragraph about why we
> chose such an odd-looking number :D.

My first thought was actually to set max_locks_per_transaction=100,
making it a nice round number :-). But then the neighboring default of
max_pred_locks_per_transaction=64 looks weird. We could reduce it
max_pred_locks_per_transaction=50 to make it fit in. But it feels a
little arbitrary to change just for aesthetic reasons.

> I think we should highlight the change in default in the release notes
> though. The users which use default configuration will notice an
> increase in the memory. If they are using a custom value, they will
> think of bumping it up. Can we give them some ballpark % by which they
> should increase their max_locks_per_transaction? E.g. double the
> number or something?

I don't think people who are using the defaults will notice. I'm worried
about the people who have set max_locks_per_transactions manually, and
now effectively get less lock space for the same setting. Yeah, doubling
the previous value is a good rule of thumb.

- Heikki

In response to

Re: Shared hash table allocations at 2026-04-02 12:55:23 from Ashutosh Bapat

Responses

Re: Shared hash table allocations at 2026-04-02 14:52:32 from Ashutosh Bapat
Re: Shared hash table allocations at 2026-04-02 16:13:49 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2026-04-02 14:15:55	Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message	Andrew Dunstan	2026-04-02 14:01:54	Re: Change default of jit to off