Quick Links

Re: POSIX shared memory support

From:	Chris Marcellino <cmarcellino(at)apple(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: POSIX shared memory support
Date:	2007-02-27 07:29:10
Message-ID:	B667A14D-EFA2-497B-9C5F-FEC70AFD7D57@apple.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

On Feb 26, 2007, at 10:43 PM, Tom Lane wrote:

> Chris Marcellino <cmarcellino(at)apple(dot)com> writes:
>> The System V shared memory facilities provide a method to determine
>> who is attached to a shared memory segment.
>> This is used to prevent backends that were orphaned by crashed or
>> killed database processes from corrupting the data-
>> base as it is restarted. The same effect can be achieved with using
>> the POSIX APIs,
>
> ... except that it can't ...
>
>> but since the POSIX library does not
>> have a way to check who is attached to a segment, atomic segment
>> creation must be used to ensure exclusive access to
>> the database.
>
> How does that fix the problem? If you can't actually tell whether
> someone is attached to an existing segment, then you're still up
> against
> the basic rock-and-a-hard-place issue: either you assume there is
> no one
> there (and corrupt your database if you're wrong) or you assume
> there is
> someone there (and force manual intervention by the DBA to recover
> after
> postmaster crashes). Neither of these alternatives is really
> acceptable.

Ignoring the case where backends are still alive in the database,
since they would require intervention or patience either way, there
are two options:
1) There is a postmaster/backend still running and you try to start
another postmaster: the unique segment cannot be closed and
atomically recreated and will fail as it does in the current
implementation.
2) There are no errant processes still in the database: the segment
can be closed and atomically recreated.

Try making a build with the patch, then start a postmaster for a
given folder, delete the lock file and start another postmaster (on a
different port) in that folder. Please let me know if I am
overlooking something.

>
>> In order for this to work, the key name used to open and create the
>> shared memory segment must be unique for each
>> data directory. This is done by using a strong hash of the canonical
>> form of the data directory’s pathname.
>
> "Strong hash" is not a guarantee, even if you could promise that you
> could get a unique canonical path, which I doubt you can. In any case
> this fails if the DBA decides to rename the directory on the fly
> (don't
> laugh; not only are there instances of that in our archives, there are
> people opining that we need to allow it --- even with the postmaster
> still running).

Strong hash is an effective guarantee that many computing paradigms
are based upon. The collision rate is astronomically small, and can
be made astronomically smaller with longer hashes.
(For MD5 there would need to be 10^15 postmasters on a server before
a collision is likely, and they all would need to have crashed and
left backends in the database, etc. )

True, renaming is a problem that I had had not anticipated at all.
Now that you mention it, hard links might be an issue on some
machines that don't canonicalize them to a unique path, since that
isn't required by the POSIX docs. Oh, the horrible degenerate cases.
Good point though.

Perhaps there is some other unique identifying feature of a given
database. A per-database persistent UUID would fit nicely here. It
could just be the shmem key.

>
>> This also re-
>> moves any risk of other applications, or other databases’ memory
>> segments colliding with the current shared memory
>> segment, which conveniently simplifies the logic.
>
> How exactly does it remove that risk?

This is fruitless due to the renaming issue, but the hash isn't an
issue. I'm not sure that a hex string beginning with \pg_xxxxx is any
less readable than the shmem id integers that are generated ad-hoc by
the current implementation.

> I think you're wishfully-thinking
> that if you are creating an unreadable hash value then there will
> never
> be any collisions against someone else with the same touching faith
> that
> *his* unreadable hash values will never collide with anyone else's.

I'm flattered that you hold my coding abilities with such devout
conviction, but I assure you that cryptography, even in this limited
use, is based in rational thought :).
In addition, the astronomically unlikely collision isn't a risk as
the database can't be damaged. The admin would then need to clear the
lockfile, after he won the lottery twice and was stuck by lightning
in his overturned car.

> Doesn't give me a lot of comfort.
> Not that it matters, since the
> approach is broken even if this specific assumption were sustainable.

Postmasters failing to load don't give me much comfort either, and
that isn't a pipe dream.

I suppose that the renaming issue relegates this patch to situations
where the database cannot be renamed or hard linked to and started
more than once, yet require this to start up databases without
restarting and needing to control how many other databases are
consuming shmem on the same box.

Thanks for the reply,
Chris Marcellino

>
> regards, tom lane

In response to

Re: POSIX shared memory support at 2007-02-27 06:43:11 from Tom Lane

Responses

Re: POSIX shared memory support at 2007-02-28 05:21:21 from Tom Lane

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Simon Riggs	2007-02-27 07:49:01	Re: Dead Space Map version 2
Previous Message	Tom Lane	2007-02-27 06:43:11	Re: POSIX shared memory support