Re: orangutan seizes up during isolation-check

From: Noah Misch <noah(at)leadboat(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, davec(at)postgresintl(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: orangutan seizes up during isolation-check
Date: 2014-09-15 14:22:13
Message-ID: 20140915142213.GB1332666@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 15, 2014 at 10:11:57AM +0300, Heikki Linnakangas wrote:
> On 09/15/2014 07:51 AM, Noah Misch wrote:
> >libintl replaces setlocale(). Its setlocale(LC_x, "") uses OS-specific APIs
> >to determine the default locale when $LANG and similar environment variables
> >are empty, as they are during "make check NO_LOCALE=1". On OS X, it calls[1]
> >CFLocaleCopyCurrent(), which in turn spins up a thread. See the end of this
> >message for the postmaster thread stacks active upon hitting a breakpoint set
> >at _dispatch_mgr_thread.
>
> Ugh. I'd call that a bug in libintl. setlocale() has no business to
> make the process multi-threaded.

Fair interpretation. libintl's options for mitigating the problem are even
more constrained, unfortunately.

> Do we have the same problem in backends? At a quick glance, aside
> from postmaster we only use PG_SETMASK(&BlockSig) in signal
> handlers, to prevent another signal handler from running
> concurrently.

In backends, we pass a specific value, not "", to pg_perm_setlocale(). (I
expect the problem in EXEC_BACKEND builds, but I doubt anyone uses that as a
production configuration on OS X.) Every frontend program does call
setlocale(LC_ALL, "") via set_pglocale_pgservice(). None use sigprocmask(),
but a few do use fork().

> >1. Fork, call setlocale(LC_x, "") in the child, pass back the effective locale
> > name through a pipe, and pass that name to setlocale() in the original
> > process. The short-lived child will get the extra threads, and the
> > postmaster will remain clean.

> >I'm skeptical of the value of looking up locale information using other OS X
> >facilities when the usual environment variables are inconclusive, but I see no
> >clear cause to reverse that decision now. I lean toward (1).
>
> Both of those are horrible hacks. And who's to say that calling
> setlocale(LC_x, "foo") won't also call some function that makes the
> process multi-threaded. If not in any current OS X release, it might
> still happen in a future one.

Right. Not just setlocale(); a large body of APIs could change that way. The
CAVEATS section[1] of the OS X fork() manual page suggests Apple had that in
mind at some point. To be 100% safe, we would write the postmaster as though
it's always multithreaded, including making[2] EXEC_BACKEND the only supported
configuration on OS X.

Assuming we don't do that anytime soon, I did have in mind to make the
postmaster raise an error if it becomes multithreaded. (Apple's
pthread_is_threaded_np() is usable for this check.) Then, at least, we'll
more-easily hear about new problems of this nature.

> One idea would be to use an extra pthread mutex or similar, in
> addition to PG_SETMASK(). Whenever you do PG_SETMASK(&BlockSig),
> also grab the mutex, and release it when you do
> PG_SETMASK(&UnBlockSig).

I had considered and tested such a thing, and it sufficed to clear orangutan's
present troubles. You still face undefined behavior after fork(). It's okay
in practice if libdispatch threads are careful to register pthread_atfork()
handlers for any relevant resources. It's a fair option, but running
setlocale(LC_x, "") in a short-lived child assumes less about the system
library implementation, seems no more of a hack to me, and isolates the
overhead at postmaster start.

[1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fork.2.html
[2] http://www.postgresql.org/message-id/1349280184-sup-326@alvh.no-ip.org

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-09-15 14:25:37 Re: delta relations in AFTER triggers
Previous Message Heikki Linnakangas 2014-09-15 14:13:22 Re: Triconsistent catalog declaration