Re: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Yury Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>
Subject: Re: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]
Date: 2016-08-04 20:19:25
Message-ID: 8848.1470341965@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yury Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru> writes:
> Dmitry Ivanov wrote:
>>> Recently I've encountered a strange crash while calling elog(ERROR, "...")
>>> after the WaitForBackgroundWorkerShutdown() function. It turns out that
>>> _returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
>>> since they leave PG_exception_stack pointing to a local struct in a dead
>>> frame, which is an obvious UB. I've attached a patch which fixes this
>>> behavior in the aforementioned function, but there might be more errors
>>> like that elsewhere.

> Good catch. But in 9.6 it has been fixed by
> db0f6cad4884bd4c835156d3a720d9a79dbd63a9 commit.

Only accidentally, I'd say. I chose to push this into HEAD as well,
so that WaitForBackgroundWorkerShutdown would look more like
WaitForBackgroundWorkerStartup, and would not have a risk of future
breakage if someone adds code after the for-loop.

I was pretty worried about the "more errors like that" point, because
we recently fixed another similar mistake in plpython, cf 1d2fe56e4.
However, a search using the attached quick-hack script to look for
"return" etc between PG_TRY and PG_CATCH did not find any other
instances.

It'd be nice to have some automatic way of catching future mistakes
like this, but I'm not sure about a reliable way to do that.
One idea is to add an Assert to PG_CATCH:

#define PG_TRY() \
do { \
sigjmp_buf *save_exception_stack = PG_exception_stack; \
ErrorContextCallback *save_context_stack = error_context_stack; \
sigjmp_buf local_sigjmp_buf; \
if (sigsetjmp(local_sigjmp_buf, 0) == 0) \
{ \
PG_exception_stack = &local_sigjmp_buf

#define PG_CATCH() \
+ Assert(PG_exception_stack == &local_sigjmp_buf); \
} \
else \
{ \
PG_exception_stack = save_exception_stack; \
error_context_stack = save_context_stack

but there are a couple of disadvantages to this. One is that you don't
find out about a problem unless the bad code path is actually exercised.
I'm afraid that the most tempting places to do something like this would
be low-probability error cases, and thus that the Assert would do little
to catch them. (Case in point: the postmaster-death path fixed in this
patch would surely never have been caught by testing.) The other and
really worse problem is that when the Assert does fire, it's nowhere near
the scene of the crime, so while it may tell you there's a problem it
does not give much help in locating the bug.

Anyone have a better idea?

regards, tom lane

Attachment Content-Type Size
find_bad_pg_try text/x-shellscript 331 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-04 20:31:55 Re: New version numbering practices
Previous Message Peter Eisentraut 2016-08-04 20:17:31 Re: Keeping CURRENT_DATE and similar constructs in original format