O(1) DSM handle operations

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: O(1) DSM handle operations
Date: 2017-03-27 21:13:20
Message-ID: CAEepm=3-EX8sTW7G5a_9TXABMZNpTuWeR1gtq5s+88OXGpo=vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

This is just a thought for discussion, no patch attached...

DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
linear searches of the dsm_control->item array for either a free slot
or a slot matching a given handle. Maybe no one thinks this is a
problem, because in practice the number of DSM slots you need to scan
should be something like number of backends * some small factor at
peak.

On the other hand it makes DSM slots a scarce resource and creates an
incentive for us never to allow many segments to be created, which has
knock-on implications, like dsa.c's geometric growth pattern: it
starts allocating bigger and bigger DSM segments so that we don't run
out of DSM slots which increases fragmentation (ie allocation of
memory you don't need).

Would it make sense to implement $SUBJECT like this?

1. Perform O(1) handle lookup by using N bits of dsm_handle to
identify a dsm_control->item slot number, and detect ABA slot reuse by
using the rest of the bits of dsm_handle to hold a counter which must
match a cycling counter in the control slot. This would replace the
current scheme where handles are random numbers.

2. Perform O(1) slot allocation with a freelist using a
next-free-slot link pointer in the slots themselves.

Perhaps the freelist from (2) should be a FIFO queue so that slots are
reused as slowly as possible, strengthening the counter scheme from
(1), or perhaps that's optimising for the wrong thing, I don't know.

The biggest problem with point (1) above would seem to be the SysV
implementation, where dsm_handle is transmogrified into key_t and some
of the bits might get chomped if the latter is smaller, which seems to
imply that at least the slot identification bits would need to be sure
to fit into key_t, and also the use of the special key_t sentinel
values IPC_PRIVATE which somehow needs to be avoided. Do any systems
really have sizeof(key_t) < 4?

Point (2) seems to be implementable independently of (1).

--
Thomas Munro
http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stas Kelvich 2017-03-27 21:19:29 Re: logical decoding of two-phase transactions
Previous Message Robert Haas 2017-03-27 20:29:56 Re: Patch: Write Amplification Reduction Method (WARM)