Re: problems on Solaris

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dave Page <dpage(at)pgadmin(dot)org>
Subject: Re: problems on Solaris
Date: 2015-06-24 12:42:10
Message-ID: 20150624124210.GN4797@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-05-31 01:09:18 +0200, Andres Freund wrote:
> On 2015-05-27 21:23:34 -0400, Robert Haas wrote:
> > > Oh wow, that's bad, and could explain a couple of the problems we're
> > > seing. One possible way to fix is to replace the sequence with if
> > > (!TAS(spin)) S_UNLOCK();. But that'd mean TAS() has to be a barrier,
> > > even if the lock isn't free - which e.g. isn't the case for PowerPC's
> > > implementation :(
> >
> > Another possibility is to make the fallback barrier implementation a
> > system call, like maybe kill(PostmasterPid, 0).
>
> It's not necessarily true that all system calls are effective
> barriers. I'm e.g. doubtful that kill(..., 0) is one as it only performs
> local error checking. It might be that the process existance check
> includes a lock that's sufficient, but I would not like to rely on
> it. Sending an actual signal probably would be, but has the potential of
> disrupting postmaster progress.

I thought about various other syscalls we could use, and your proposal
seems to be least worst. My idea of using waitpid() falls short because
it only works for child processes. I think the kind of systems that we
don't have barriers on, are unlikely to use complex stuff like RCU to
manage access to process hierarchies.

I reproduced the 'stuck' issue on x86 by #ifdef'ing out barrier support
- about 50% of the time test_shm_mq gets stuck. Replacing it with
kill(PostmasterPid, 0) "works". Unless somebody protests soon that's
what I'm going to commit. It surely is better than easily reproducible
hangs.

I'm wondering wether we should add a #warning to atomic.c if either the
fallback memory or compiler barrier is used? Might be annoying to people
using -Werror, but I doubt that's possible anyway on such old systems.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2015-06-24 13:02:13 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Uriy Zhuravlev 2015-06-24 11:30:21 Re: WIP: Enhanced ALTER OPERATOR