| From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com> |
| Subject: | Re: Shared hash table allocations |
| Date: | 2026-04-02 14:52:32 |
| Message-ID: | CAExHW5uNtqoSwZ0r+JXgxBSi4V98KfQCWuBxRheYTB40pf7FEg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Apr 2, 2026 at 7:44 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 02/04/2026 15:55, Ashutosh Bapat wrote:
> > When we "allocate" shared memory, we are just allocating space on
> > systems which use mmap. The memory gets allocated only when it is
> > touched. The wiggle room as a whole is never touched during
> > initialization. Those pages get allocated when wiggle room is used -
> > i.e. when the entries beyond initial number are allocated. By
> > allocating maximal hash tables, I was worried that we will allocate
> > more memory than required. But that's not true since a 4K memory page
> > fits only 50-60 entries - far less than the default configuration
> > permits. Most of the memory for the hash table will be allocated as
> > the entries as used.
>
> Hmm, that's a good point about untouched memory not being allocated. I
> think it's fine, though.
>
> With small changes on top of the the earlier refactorings from this
> thread, we could stop pre-allocating all the elements when a shared
> memory hash table is created, and have ShmemHashAlloc() allocate them on
> the fly, but instead of doing them as anonymous allocations like we do
> with ShmemAlloc() today, the allocations could come from the
> pre-allocated region dedicated to the hash table. You'd still get the
> same determinism and visibility in pg_shmem_allocations, but you could
> avoid actually touching the pages until they're needed. Not sure it's
> worth the trouble.
share hash table refactoring + shared memory structure refactoring +
resizable structures, we should be able to get resizable shared hash
tables as well. But that's not required immediately. I feel large hash
tables like buffer hash table, lock hash tables can benefit from this
kind of thing.
>
> > The second hazard of increasing hash table size is the hash table
> > access becomes slower as it becomes sparse [1]. I don't think it shows
> > up in performance but maybe worth trying a trivial pgbench run, just
> > to make sure that default performance doesn't regress.
>
> Interesting, but yeah I don't think that's going to be measurable. I did
> some quick testing with a test function that just locks and unlocks
> relations:
>
> PG_FUNCTION_INFO_V1(test_lock_bench);
> Datum
> test_lock_bench(PG_FUNCTION_ARGS)
> {
> int32 num_distinct_locks = PG_GETARG_INT32(0);
> int32 num_acquires = PG_GETARG_INT32(1);
>
> LOCKMODE lockmode = AccessExclusiveLock;
>
> #define FIRST_RELID 1000000000
>
> for (int32 i = 0; i < num_acquires; i++)
> {
> Oid relid = FIRST_RELID + i % num_distinct_locks;
>
> if (i >= num_distinct_locks)
> UnlockRelationOid(relid, lockmode);
>
> if (!ConditionalLockRelationOid(relid, lockmode))
> {
> elog(LOG, "could not acquire lock, iteration %d", i);
> break;
> }
> }
>
> PG_RETURN_VOID();
> }
>
> With test_lock_bench(1, 5000000), I don't see any meaningful difference,
> i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10
> to max_locks_per_transactions=128.
>
> With more distinct locks involved, the caching effects might be bigger,
> and maybe you'd see a difference because of more or less collisions.
> Spot testing some values on my laptop, I don't see anything that would
> worry me though.
Great. This agrees with my experiments with sparse buffer lookup table.
>
> > The increase in memory usage is 3MB, which is fine usually. I mean, we
> > didn't hear any complaints when we increased the default size of the
> > shared buffer pool - this is much less than that. But why do you want
> > to double the max_locks_per_transaction? I first thought it's because
> > the hash table size is anyway a power of 2. But then the size of the
> > hash table is actually max_locks_per_transaction * (number of backends
> > + number of prepared transactions). What we want is the default
> > max_locks_per_transaction such that 14927 locks are allowed. Playing
> > with max_locks_per_transaction using your script 109 seems to be the
> > number which will give us 14951 locks. It looks (and is) an odd
> > number. If we are worried about memory increase, that's the number we
> > should use as default and then write a long paragraph about why we
> > chose such an odd-looking number :D.
>
> My first thought was actually to set max_locks_per_transaction=100,
> making it a nice round number :-). But then the neighboring default of
> max_pred_locks_per_transaction=64 looks weird. We could reduce it
> max_pred_locks_per_transaction=50 to make it fit in. But it feels a
> little arbitrary to change just for aesthetic reasons.
+1. Let's keep it 128 and see if there are complaints. We can set it
to 100 or 109 if the complaints look serious.
--
Best Wishes,
Ashutosh Bapat
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Laurenz Albe | 2026-04-02 14:55:02 | Re: Add ldapservice connection parameter |
| Previous Message | Melanie Plageman | 2026-04-02 14:31:50 | Re: AIO / read stream heuristics adjustments for index prefetching |