Re: Posix Shared Mem patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Posix Shared Mem patch
Date: 2012-06-27 02:40:54
Message-ID: 18958.1340764854@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Reflecting on this further, it seems to me that the main remaining
> failure modes are (1) file locking doesn't work, or (2) idiot DBA
> manually removes the lock file.

Oh, wait, I just remembered the really fatal problem here: to quote from
the SUS fcntl spec,

All locks associated with a file for a given process are removed
when a file descriptor for that file is closed by that process
or the process holding that file descriptor terminates.

That carefully says "a file descriptor", not "the file descriptor
through which the lock was acquired". Any close() referencing the lock
file will do. That means that it is possible for perfectly innocent
code --- for example, something that scans all files in the data
directory, as say pg_basebackup might do --- to cause a backend process
to lose its lock. When we looked at this before, it seemed like a
showstopper. Even if we carefully taught every directory-scanning loop
in postgres not to touch the lock file, we cannot expect that for
instance a pl/perl function wouldn't accidentally break things. And
99.999% of the time nobody would notice ... it would just be that last
0.001% of people that would be screwed.

Still, this discussion has yielded a useful advance, which is that we
now see how we might safely make use of lock mechanisms that don't
inherit across fork(). We just need something less broken than fcntl().

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2012-06-27 03:28:06 Re: WIP Patch: Selective binary conversion of CSV file foreign tables
Previous Message David Fetter 2012-06-27 02:11:12 Re: Catalog/Metadata consistency during changeset extraction from wal