Thread-unsafe coding in ecpg

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Michael Meskes <meskes(at)postgresql(dot)org>
Subject: Thread-unsafe coding in ecpg
Date: 2019-01-18 03:54:57
Message-ID: 31420.1547783697@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've found that a couple of different OpenBSD 6.4 machines fall over
badly in the ecpg regression tests, with output like

test sql/parser ... ok
test thread/thread ... stdout stderr FAILED (test process was terminated by signal 6: Abort trap)
test thread/thread_implicit ... stdout FAILED (test process was terminated by signal 10: Bus error)
test thread/prep ... ok (test process was terminated by signal 10: Bus error)
test thread/alloc ... stderr FAILED (test process was terminated by signal 6: Abort trap)
test thread/descriptor ... ok

It's somewhat variable as to which tests fail, but it's always thread
tests. Examining the core dumps shows traces like

#0 thrkill () at -:3
#1 0x00000c04f427dd6e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
#2 0x00000c04f425f7e9 in wrterror (d=Variable "d" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:291
#3 0x00000c04f42628fb in find_chunknum (d=Variable "d" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:1043
#4 0x00000c04f425fe23 in ofree (argpool=Variable "argpool" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:1359
#5 0x00000c04f425f8ec in free (ptr=0xc04df0e26e0)
at /usr/src/lib/libc/stdlib/malloc.c:1419
#6 0x00000c04f427ec83 in freegl (oldgl=0xc05a022d080)
at /usr/src/lib/libc/locale/setlocale.c:32
#7 0x00000c04f427eb49 in _libc_setlocale (category=4,
locname=0xc059b605180 "C") at /usr/src/lib/libc/locale/setlocale.c:177
#8 0x00000c0531a6f955 in ecpg_do_epilogue (stmt=0xc0587bb0c00)
at execute.c:1986
#9 0x00000c0531a6fa65 in ecpg_do (lineno=Variable "lineno" is not available.
) at execute.c:2018
#10 0x00000c0531a6fb31 in ECPGdo (lineno=Variable "lineno" is not available.
) at execute.c:2037
#11 0x00000c02a9f00b19 in test_thread (arg=Variable "arg" is not available.
) at thread.pgc:131
#12 0x00000c04b180b26e in _rthread_start (v=Variable "v" is not available.
)
at /usr/src/lib/librthread/rthread.c:96
#13 0x00000c04f42ba77b in __tfork_thread ()
at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#14 0x0000000000000000 in ?? ()

or this one:

#0 _libc_strlcpy (dst=0x2c61e133ee0 "C",
src=0xdfdfdfdfdfdfdfdf <Address 0xdfdfdfdfdfdfdfdf out of bounds>,
dsize=256) at /usr/src/lib/libc/string/strlcpy.c:36
#1 0x000002c61de99a71 in _libc_setlocale (category=4, locname=0x0)
from /usr/lib/libc.so.92.5
#2 0x000002c5ae2693a8 in ecpg_do_prologue (lineno=59, compat=0,
force_indicator=1, connection_name=0x0, questionmarks=false,
statement_type=ECPGst_execute, query=0x2c333701418 "i",
args=0x2c61a96f6d0, stmt_out=0x2c61a96f5e0) at execute.c:1776
#3 0x000002c5ae269a20 in ecpg_do (lineno=Variable "lineno" is not available.
) at execute.c:2001
#4 0x000002c5ae269b31 in ECPGdo (lineno=Variable "lineno" is not available.
) at execute.c:2037
#5 0x000002c333600b47 in fn (arg=Variable "arg" is not available.
) at prep.pgc:59
#6 0x000002c56a00b26e in _rthread_start (v=Variable "v" is not available.
)
at /usr/src/lib/librthread/rthread.c:96
#7 0x000002c61ded577b in __tfork_thread ()
at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#8 0x0000000000000000 in ?? ()

The common denominator is always a call to setlocale(), and
that generally is calling malloc or free or some other libc
function that is unhappy. When output appears on stderr,
it's usually free complaining about double-frees or some such.

So my conclusion is that this version of setlocale() has some
thread-safety issues. I was all set to go file a bug report
when I noticed this in the POSIX spec for setlocale:

The setlocale() function need not be thread-safe.

as well as this:

The global locale established using setlocale() shall only be used
in threads for which no current locale has been set using
uselocale() or whose current locale has been set to the global
locale using uselocale(LC_GLOBAL_LOCALE).

IOW, not only is setlocale() not necessarily thread-safe itself,
but using it to change locales in a multithread program is seriously
unsafe because of concurrent effects on other threads.

Therefore, it's plain crazy for ecpg to be calling setlocale() inside
threaded code. It looks to me like what ecpg is doing is trying to defend
itself against non-C LC_NUMERIC settings, which is laudable, but this
implementation of that is totally unsafe.

Don't know what's the best way out of this. The simplest thing would
be to just remove that code and document that you'd better run ecpg
in LC_NUMERIC locale, but it'd be nice if we could do better.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2019-01-18 04:08:15 Re: current_logfiles not following group access and instead follows log_file_mode permissions
Previous Message Masahiko Sawada 2019-01-18 03:50:15 Fix function name in commet in vacuumlazy.c