In the bufmgr code we keep track of reference counts on individual
buffers. We do this with a local array that grows in proportion to the
size of shared_buffers.
When we have a high shared_buffers, together with a very large number of
connected users then this will waste considerable RAM that would
otherwise been available to OS buffers.
Each element of the array is an int32, so with 200 connected users and a
shared_buffers = 250,000 this would use 200 MB of RAM. For 1000
connected users and 500,000 buffers this would use 2 GB of RAM. This
would clearly be a problem when attempting to use larger shared_buffers
to increase performance, which some people believe is useful. IMHO we
should support those people who find such settings beneficial.
The number of pins held by any backend at once is mostly 1-2, but could
in some circumstances go as high as the number of tables in a join. So
the vast majority of the array would be unused, yet we require random
access to it. This will cause cache churning and some delays while we
request the needed parts of the array from main memory.
Under specific conditions, I propose to replace the array with a hash
table, designed with a custom hash function that would map the pins held
onto just 16 hash buckets. That is small enough to fit completely in
cache, yet large enough to avoid overhead of collisions in most cases.
Should the hash table overflow at any time, it would be replaced by the
standard array as a fallback (with all code as now).
The gain from the additional available memory and the local cache
efficiency will at some point outweigh the cost of indirect access to
private ref counts. The suggested conditions to use the hash table would
be when NBuffers > 100000 and max_connections > 200, but those are just
gut feelings. We can measure that point if the idea is acceptable.
pgsql-hackers by date
|Next:||From: Simon Riggs||Date: 2006-11-27 10:49:39|
|Subject: HugeTLB support (in 8.3)|
|Previous:||From: Markus Schaber||Date: 2006-11-27 10:39:21|
|Subject: Re: Open source databases '60 per cent cheaper'|