Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )
Date: 2020-09-09 06:16:13
Message-ID: CAMsr+YHX_6=ZPLygVyOyZAA6uScHgTAsT6bn3h7YUPRNcDMXBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 8 Sep 2020 at 10:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> So that's good. However, a similar experiment with returning from inside
> a PG_TRY does *not* produce a warning. I suspect maybe the compiler
> throws up its hands when it sees sigsetjmp?
>

I find that odd myself, given that in PG_TRY() we:

PG_exception_stack = &_local_sigjmp_buf;

and a return within that block should count as a pointer to
_local_sigjmp_buf escaping. Yet I confirm that I don't get a warning on
clang-analyzer 10.0.0 (fedora 32) in my builds either. Even with -O0.

TL;DR:
-------

It's not due to sigsetjmp() at all though; see
https://github.com/ringerc/scrapcode/blob/master/c/clang_return_stack_checks/mask_local_escape.c
for part of what's going on.

I haven't worked out why sometimes an unrelated stack escape gets hidden
though.

Attached patches demonstrate tests and are DEFINITELY NOT proposals for
core!

SUMMARY
-----

I found per test details below (links and demo patches supplied, patches
NOT proposed for inclusion in pg) that scan-build seems to be able to track
a local escape via simple indirection such as copying the address to a
local first, but cannot track it when it is assigned to a struct member in
a global pointer if that global pointer is then replaced, even if it's
restored immediately.

Demonstration code to illustrate this is at
https://github.com/ringerc/scrapcode/blob/master/c/clang_return_stack_checks/mask_local_escape.c

This produces no warnings from scan-build but leaks a pointer to an
auto-variable on the stack, then dereferences it to read garbage. I
deliberately clobber the stack to ensure that it's predictable garbage, so
we get:

$ make run_mask_local_escape
./mask_local_escape
&g = 0x7fff1b31e6bc
&g has escaped: 0x7fff1b31e6bc
value is: 0x7f7f7f7f

I think we're using a similar pattern for PG_error_stack, and that's why we
don't get any warnings.

In this case valgrind detects the issue when the bogus data is actually
read. Run "valgrind ./mask_local_escape". But I'm pretty sure I've seen
other cases where it does not.

So I'm wondering if the explicit __attribute__((callback(cbfn))) approach
has some merits after all, for cassert builds. Assuming we care about this
error, since it's a bit of an academic exercise at this point. I just
needed to understand what was going on and why there was no warning.

DETAILS OF TESTS
-----

If I wrap the sigjmp_buf in a struct defined in Pg itself and allow that to
escape I see the same behaviour, so it's not specific to sigjmp_buf.

I tried adding a dummy guard variable to PG_TRY() and PG_END_TRY() solely
for the purpose of tracking this sort of escape. But I notice that a
warning from scan-build is only emitted in PG_CATCH() or PG_FINALLY(),
never from inside the PG_TRY() body. I'd expect to see a warning from
either branch, as it's escaping either way.

scan-build will raise other warnings from within the PG_CATCH() block, such
as if I introduce a

+ int foo = 0;
+ fprintf(stdout, "%d", 10/foo);

so it's not that scan-build as a whole is failing to take the sigsetjmp()
== 0 branch. It raises *other* warnings. This can also be verified by
replacing sigsetjmp() with a dummy static inline function that always takes
one branch or the other.

I landed up writing the experiment code at
https://github.com/ringerc/scrapcode/blob/master/c/clang_return_stack_checks/
to explore the behaviour some more.

The file sigjmp_escape.c explores various combinations of guard variables
and early returns. The file sigjmp_escape_hdr.c and its companions
sigjmp_escape_hdr_try.{c,h} check whether indirection via separate header
and compilation unit makes any difference (it didn't).

Initially I found that in my PG_TRY()-alike test code I got a warning like:

sigjmp_escape_hdr.c:36:4: warning: Address of stack memory associated with
local variable '_local_sigjmp_buf' is still referred to by the global
variable 'TEST_exception_stack' upon returning to the caller. This will be
a dangling reference
return;
^~~~~~

just like I want to see from inside a PG_TRY() block.

But with some further testing I noticed that it seems the checker may not
notice escapes where there's even the simplest level of indirection, and
also that one candidate escape may mask another. I'm not totally sure yet.

In the sigjmp_escape.c test, if I build it with both inner and outer guards
(-DUSE_INNER_GUARDS), scan-build will only complain about the inner guard
pointer g_branch_1 escaping. It will not complain about the stack pointer
to the outer guard escaping, nor about the possibility of
should_return_early causing the sigjmp_buf to escape into the global
sigjmp_buf * jbuf. Yet running

./sigjmp_escape 1 1

shows that both can occur.

This still doesn't explain how it *also* misses the escape of &b into jbuf.
Is it losing track of the unrelated escape when it stops tracking the first
one?

The test case sigjmp_escape_hdr doesn't attempt to use the guards, and it
reports the sigjmp_buf escape fine as shown above.

If sigjmp_escape.c is built with -DUSE_INNER_GUARDS -DPOP_ONLY_INNER_GUARDS
then it raises no warnings at all, BUT has both the sigjmp_buf leak AND a
guard variable stack escape as seen by:

./sigjmp_escape_inner_pop 1 1

We've seen above that it loses the outer guard pointer escape because it
doesn't track any sort of indirection of assignments.

What's a bit surprising is that it *also* fails to complain about the
escape of the sigjmp_buf pointer &b into the global variable jbuf in this
case, even though it successfully detects the escape with other
combinations of guard uses. It's like it can't track multiple things at
once properly.

RELATED: detecting return from PG_FINALLY()
--------

Note that even if scan-build did detect the escape of the sigjmp_buf,
returning from PG_FINALLY() is a coding error that we cannot detect this
way since we've restored PG_errror_stack so there's no local auto ptr to
escape. To do that we'd need our own stack that we pushed in PG_TRY() and
didn't pop until PG_END_TRY(), like in the demo patch attached. That works,
but adds another local to each PG_TRY() that's otherwise useless. If we
really felt like it we could use it to produce a poor-man's backtrace
through each PG_TRY() at the cost of one sizeof(void) per PG_TRY() + some
constdata storage, but that seems like a solution looking for a problem.

Also, AFAICS it's actually harmless, if ugly, to return from PG_CATCH(). It
should still be discouraged.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

Attachment Content-Type Size
v999-0001-Deliberately-break-something-so-we-can-check-it.patch text/x-patch 1.5 KB
v999-0003-Add-guards-to-detect-return-from-between-PG_TRY.patch text/x-patch 2.5 KB
v999-0002-Show-that-it-s-not-special-casing-of-sigjmp_buf.patch text/x-patch 7.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2020-09-09 06:16:20 Re: Inconsistent Japanese name order in v13 contributors list
Previous Message tsunakawa.takay@fujitsu.com 2020-09-09 05:58:10 RE: Inconsistent Japanese name order in v13 contributors list