ECPG thread success (kind of) on Linux

From: Philip Yarra <philip(at)utiba(dot)com>
To: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
Cc: pgsql-interfaces(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: ECPG thread success (kind of) on Linux
Date: 2003-06-27 00:45:46
Message-ID: 200306271045.46789.philip@utiba.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces

On Thu, 26 Jun 2003 11:19 am, Philip Yarra wrote:

> there appears to still be a problem
> occurring at "EXEC SQL DISCONNECT con_name". I'll look into it tonight if I
> can.

I did some more poking around last night, and believe I have found the issue:
RedHat Linux 7.3 (the only distro I have access to currently) ships with a
fairly challenged pthreads inplementation. The default mutex type (which you
get from PTHREAD_MUTEX_INITIALIZER) is, according the the man page,
PTHREAD_MUTEX_FAST_NP which is not a recursive mutex. If a thread owns a
mutex and attempts to lock the mutex again, it will hang.

By replacing PTHREAD_MUTEX_INITIALIZER with PTHREAD_MUTEX_RECURSIVE_NP for the
two mutexes that are used recursively (debug_mutex and connections_mutex) I
got my sample app to work flawlessly on Linux RedHat 7.3

Sadly, the _NP suffix is used to indicate non-portable, so of course my
FreeBSD box steadfastly refused to compile it. Darn.

The correct way to do this appears to be:

pthread_mutexattr_t *mattr;
pthread_mutexattr_settype(mattr, PTHREAD_MUTEX_RECURSIVE);

(will verify this against FreeBSD when I get home, and Tru64 man page
indicates support for this too, so I'll test that later). It won't work on
RedHat Linux 7.3... I guess something like:

#ifdef DODGY_PTHREADS
#define PTHREAD_MUTEX_RECURSIVE = PTHREAD_MUTEX_RECURSIVE_NP
#endif

might do it... if we could detect the problem during configure. How is this
sort of detection handled in other cases (such as long long, etc)?

The other solution I can think of is to eradicate the two recursive locks I
found.

One is simple: ECPGlog calls ECPGdebug, which share debug_mutex - it ought to
be okay to use different mutexes for each of these functions (there's a risk
someone might call ECPGdebug while someone else is running through ECPGlog,
but I think it is less likely, since it is a debug mechanism.)

The second recursive lock I found is ECPGdisconnect calling
ECPGget_connection, both of which share a mutex. Would it be okay if we did
the following:

ECPGdisconnect() still locks connections_mutex, but calls
ECPGget_connection_nr() instead of ECPGget_connection()

ECPGget_connection() becomes a locking wrapper, which locks connections_mutex
then calls ECPGget_connection_nr()

ECPGget_connection_nr() is a non-locking function which implements what
ECPGget_connection() currently does.

I'm not sure if this sort of thing is okay (and there may be other recursive
locking scenarios that I haven't exercised yet).

What approach should I take? I'm leaning towards eradicating recursive locks,
unless someone has a good reason not to.

> All this does kinda raise the interesting question of why it worked at all
> on FreeBSD... probably different scheduling and blind luck, I suppose.

FreeBSD 4.8 must have PTHREAD_MUTEX_RECURSIVE as default mutex type. I'm a bit
concerned about FreeBSD 4.2 though - I noticed (before I blew it away in
favour of 4.8) that its pthreads implementation came from a package called
linuxthreads.tgz - it might have inherited the same problematic behaviour.
Could someone with access to or knowledge of FreeBSD 4.2 check what the
default mutex type is there?

Regards, Philip.

I can just see the ad for 7.3's pthreads impementation
"Fast mutexes: zero to deadlock in 6.9 milliseconds!"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 2003-06-27 01:15:13 Re: Two weeks to feature freeze
Previous Message Gonyou, Austin 2003-06-26 20:45:19 Re: Two weeks to feature freeze

Browse pgsql-interfaces by date

  From Date Subject
Next Message Forest Wilkinson 2003-06-27 01:04:00 Re: Weird behaviour on Solaris: recv() returns ENOENT
Previous Message Joe Conway 2003-06-26 22:23:58 Re: Dissecting Tuples in C