Most likely a waste of development effort --- have you got any evidence
> of a real effect here? With 200 max_connections the size of the arrays
> is still less than 10% of the space occupied by the buffers themselves,
> ergo there isn't going to be all that much cache-thrashing compared to
> what happens in the buffers themselves. You're going to be hard pressed
> to buy back the overhead of the hashing.
> It might be interesting to see whether we could shrink the refcount
> entries to int16 or int8. We'd need some scheme to deal with overflow,
> but given that the counts are now backed by ResourceOwner entries, maybe
> extra state could be kept in those entries to handle it.
I did some instrumentation coupled with pgbench/dbt2/views/join query runs
to find out the following:
(a) Maximum number of buffers pinned simultaneously by a backend: 6-9
(b) Maximum value of simultaneous pins on a given buffer by a backend: 4-6
(a) indicates that for large shared_buffers value we will end up with space
wastage due to a big PrivateRefCount array per backend (current allocation
is (int32 * shared_buffers)).
(b) indicates that the refcount to be tracked per buffer is a small enough
value. And Tom's suggestion of exploring int16 or int8 might be worthwhile.
Following is the Hash Table based proposal based on the above readings:
- Do away with allocating NBuffers sized PrivateRefCount array which is
an allocation of (NBuffers * int).
- Define Pvt_RefCnt_Size to be 64 (128?) or some such value so as to be
ahead of the above observed ranges. Define Overflow_Size to be 8 or some
similar small value to handle collisions.
- Define the following Hash Table entry to keep track of reference counts
int32 NextEnt; /* To handle collisions */
- Define a similar Overflow Table entry as above to handle collisions.
An array HashRefCntTable of such HashRefCntEnt'ries of size Pvt_RefCnt_Size
initialized in the InitBufferPoolAccess function.
An OverflowTable of size Overflow_Size will be allocated. This array will be
sized dynamically (2* current Overflow_Size) to accomodate more entries if
it cannot accomodate further collisions in the main table.
We do not want the overhead of a costly hashing function. So we will use
(%Pvt_RefCnt_Size i.e modulo Pvt_RefCnt_Size) to get the index where the
needs to go. In short our hash function is (bufid % Pvt_RefCnt_Size) which
should be a cheap enough operation.
Considering that 9-10 buffers will be needed, the probability of collisions
will be less. Collisions will arise only if buffers with ids (x, x +
Pvt_RefCnt_Size, x + 2*Pvt_RefCnt_Size etc.) get used in the same operation.
This should be pretty rare.
Functions PinBuffer, PinBuffer_Locked, IncrBufferRefCount, UnpinBuffer etc.
will be modified to consider the above mechanism properly. The changes will
be localized in the buf_init.c and bufmgr.c files only.
In response to
pgsql-hackers by date
|Next:||From: Hubert FONGARNAND||Date: 2007-01-16 10:26:37|
|Subject: Temparary disable constraint|
|Previous:||From: Magnus Hagander||Date: 2007-01-16 09:14:26|
|Subject: Re: [HACKERS] Checkpoint request failed on version 8.2.1.|