Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )
Date: 2020-09-04 06:13:23
Message-ID: CAMsr+YF_Tqcq5zrv3emZ_gvZnzur4aE9f4sj8Pndct-a5h1XYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 3 Sep 2020 at 22:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
> > The attached patch series adds support for detecting coding errors where
> a
> > stack-allocated ErrorContextCallback is not popped from the
> > error_context_stack before the variable leaves scope.
>
> So my immediate thoughts about this are
>
> (1) It's mighty invasive for what it accomplishes. AFAIK we have
> had few of this class of bug, and so I'm not excited about touching
> every callback use in the tree to add defenses. It's also not great
> that future code additions won't be protected unless they remember
> to add these magic annotations. The PG_TRY application is better since
> it's wrapped into the existing macro.
>

I agree with you there. I'd actually like to change how we set up
errcontext callbacks anyway, as I think they're ugly, verbose and
error-prone.

See patch 6 for a RFC on that.

With regards to PG_TRY() you may note that patch 5 adds a similar check to
detect escapes from PG_TRY() / PG_CATCH() bodies.

(2) I don't like that this is adding runtime overhead to try to detect
> such bugs.
>

Right. Only in cassert builds, though.

Frankly I'd be happy enough even having it as something I can turn on when
I wanted it. I've had a heck of a time tracking down errcontext escapes in
the past. I wrote it for an extension, then figured I'd submit it to core
and see if anyone wanted it.

Even if we don't adopt it in PostgreSQL, now it's out there to help out
others debugging similar issues.

(3) I think it's a complete failure that such a bug will only be
> detected if the faulty code path is actually taken. I think it's
> quite likely that any such bugs that are lurking are in "can't
> happen", or at least hard-to-hit, corner cases --- if they were in
> routinely tested paths, we'd have noticed them by now. So it'd be far
> more helpful if we had a static-analysis way to detect such issues.
>
> Thinking about (3), I wonder whether there's a way to instruct Coverity
> to detect such errors.
>

I think calling it a "complete failure" is ridiculous. By the same
rationale, we shouldn't bother with Assert(...) at all. But I agree that
it's definitely far from as good as having a statically verifiable check
would be, and this *should* be something we can statically verify.

I actually had a pretty good look around for static analysis options to see
if I could find anything that might help us out before I landed up with
this approach.

The sticking point is the API we use. By using auto stack variables (and
doing so in pure C where they cannot have destructors) and using direct
assignments to globals, we cannot use the majority of static analysis
annotations since they tend to be aimed at function-based APIs. And for
some reason most static analysis tools appear to be poor at discovering
escapes of pointers to stack variables leaving scope, at least unless you
use annotated functions that record ownership transfers. Which we can't
because ... direct assignment.

It's one of the reasons I want to wrap the errcontext API per patch 6 in
the above series. The current API is extremely verbose, hard to validate
and check, and difficult to apply static analysis to.

If we had error context setup/teardown macros we could implement them using
static inline functions and annotate them as appropriate for the target
toolchain and available static analysis tools.

So what do you think of patch 6?

I'll try tweaking the updated API to add annotateable functions and see if
that helps static analysis tools detect issues, then report back.

I personally think it'd be well worth adopting a wrapped API and
simultaneously renaming ErrorContextCallback to ErrorContextCallbackData to
ensure that code must be updated to compile with the changes. It'd be
simple to provide a backwards compatibility header that extension authors
can copy into their trees so they can use the new API when building against
old postgres, greatly reducing the impact on extension authors. (We do that
sort of thing constantly in pglogical and BDR).

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-09-04 06:33:36 Re: Compatible defaults for LEAD/LAG
Previous Message Tom Lane 2020-09-04 05:52:20 Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur