Re: error-handling in ReorderBufferCommit() seems somewhat broken

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: error-handling in ReorderBufferCommit() seems somewhat broken
Date: 2017-08-30 00:19:47
Message-ID: 20170830001946.imxibsc7dxa3yewn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-08-30 02:11:10 +0200, Tomas Vondra wrote:
>
>
> On 08/30/2017 01:34 AM, Andres Freund wrote:
> > Hi,
> >
> > On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:
> >> I've been investigating some failures in test_decoding regression tests,
> >> and it seems to me the error-handling in ReorderBufferCommit() is
> >> somewhat broken, leading to segfault crashes.
> >>
> >> The problematic part is this:
> >>
> >> PG_CATCH()
> >> {
> >> /*
> >> * Force cache invalidation to happen outside of a valid transaction
> >> * to prevent catalog access as we just caught an error.
> >> */
> >> AbortCurrentTransaction();
> >>
> >> /* make sure there's no cache pollution */
> >> ReorderBufferExecuteInvalidations(rb, txn);
> >>
> >> ...
> >>
> >> }
> >>
> >> Triggering it trivial - just add elog(ERROR,...) at the beginning of the
> >> PG_TRY() block.
> >
> > That's not really a valid thing to do, you should put it after the
> > BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
> > assumed that those won't fail - arguably they should be outside of a
> > PG_TRY then, but that's a different matter. If you start decoding
> > outside of SQL failing before those will lead to rolling back the parent
> > tx...
> >
> >
> >> I suppose this is not quite intentional, but rather an example that
> >> error-handling code is an order of magnitude more complicated to write
> >> and test. I've only noticed as I'm investigating some regression
> >> failures on Postgres-XL 10, which does not support subtransactions and
> >> so the BeginInternalSubTransaction() call in the try branch always
> >> fails, triggering the issue.
> >
> > So, IIUC, there's no live problem in postgres core, besides some ugly &
> > undocumented assumptions?
> >
>
> I'm not really following your reasoning. You may very well be right that
> the BeginInternalSubTransaction() example is not quite valid on postgres
> core, but I don't see how that implies there can be no other errors in
> the PG_TRY block. It was merely an explanation how I noticed this issue.
>
> To make it absolutely clear, I claim that the PG_CATCH() block is
> demonstrably broken as it calls AbortCurrentTransaction() and then
> accesses already freed memory.

Yea, but that's not what happens normally. The txn memory isn't
allocated in the context created by the started [sub]transaction. I
think you're just seeing what you're seeing because an error thrown
before the BeginInternalSubTransaction() succeeds, will roll back the
*containing* transaction (the one that did the SQL SELECT * FROM
pg_*logical*() call), and free all the entire reorderbuffer stuff.

> The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
> the switch, and AFAICS hitting any of them will have exactly the same
> effect as failure in BeginInternalSubTransaction().

No, absolutely not. Once the subtransaction starts, the
AbortCurrentTransaction() will abort that subtransaction, rather than
the containing one.

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-08-30 00:23:52 Re: error-handling in ReorderBufferCommit() seems somewhat broken
Previous Message Amit Langote 2017-08-30 00:13:16 Re: Tuple-routing for certain partitioned tables not working as expected