Re: logical decoding of two-phase transactions

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: logical decoding of two-phase transactions
Date: 2017-03-29 00:47:08
Message-ID: CAMsr+YFOR=QAvk_2e+Yq8Jg01TBPwHCN1ccEsqbUxz5phFfFgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 Mar. 2017 23:08, "Andres Freund" <andres(at)anarazel(dot)de> wrote:

>
> > >> I don't think its for us to say what the plugin is allowed to do. We
> > >> decided on a plugin architecture, so we have to trust that the plugin
> > >> author resolves the issues. We can document them so those choices are
> > >> clear.
> > >
> > > I don't think this is "plugin architecture" related. The output pluging
> > > can't do right here, this has to be solved at a higher level.
> >
> > That assertion is obviously false... the plugin can resolve this in
> > various ways, if we allow it.
>
> Handling it by breaking replication isn't handling it (e.g. timeouts in
> decoding etc).

IMO, if it's a rare condition and we can abort decoding then recover
cleanly and succeed on retry, that's OK. Not dissimilar to the deadlock
detector. But right now that's not the case, it's possible (however
artificially) to create prepared xacts for which decoding will stall and
not succeed.

> Handling it by rolling back *prepared* transactions
> (which are supposed to be guaranteed to succeed!), isn't either.
>

I agree, we can't rely on anything for which the only way to continue is to
rollback a prepared xact.

> > You can say that in your opinion you prefer to see this handled in
> > some higher level way, though it would be good to hear why and how.
>
> It's pretty obvious why: A bit of DDL by the user shouldn't lead to the
> issues mentioned above.
>

I agree that it shouldn't, and in fact DDL is the main part of why I want
2PC decoding.

What's surprised me is that I haven't actually been able to create any
situations where, with test_decoding, we have such a failure. Not unless I
manually LOCK TABLE pg_attribute, anyway.

Notably, we already disallow prepared xacts that make changes to the
relfilenodemap, which covers a lot of the problem cases like CLUSTERing
system tables.

> > Bottom line here is we shouldn't reject this patch on this point,
>
> I think it definitely has to be rejected because of that. And I didn't
> bring this up at the last minute, I repeatedly brought it up before.
> Both to Craig and Stas.
>

Yes, and I lost track of it while focusing on the catalog tuple visibility
issues. I warned Stas of this issue when he first mentioned an interest in
decoding of 2PC actually, but haven't kept a proper eye on it since.

Andres and I even discussed this back in the early BDR days, it's not new
and is part of why I poked Stas to try some DDL tests etc. The tests in the
initial patch didn't have enough coverage to trigger any issues - they
didn't actually test decoding of a 2pc xact while it was still in-progress
at all. But even once I added more tests I've actually been unable to
reproduce this in a realistic real world example.

Frankly I'm confused by that, since I would expect an AEL on some_table to
cause decoding of some_table to get stuck. It does not.

That doesn't mean we should accept failure cases and commit something with
holes in it. But it might inform our choices about how we solve those
issues.

> One way to fix this would be to allow decoding to acquire such locks
> (i.e. locks held by the prepared xact we're decoding) - there
> unfortunately are some practical issues with that (e.g. the locking code
> doesnt' necessarily expect a second non-exclusive locker, when there's
> an exclusive one), or we could add an exception to the locking code to
> simply not acquire such locks.
>

I've been meaning to see if we can use the parallel infrastructure's
session leader infrastructure for this, by making the 2pc fake-proc a
leader and making our decoding session inherit its locks. I haven't dug
into it to see if it's even remotely practical yet, and won't be able to
until early pg11.

We could proceed with the caveat that decoding plugins that use 2pc support
must defer decoding of 2pc xacts containing ddl until commit prepared, or
must take responsibility for ensuring (via a command filter, etc) that
xacts are safe to decode and 2pc lock xacts during decoding. But we're
likely to change the interface for all that when we iterate for pg11 and
I'd rather not carry more BC than we have to. Also, the patch has unsolved
issues with how it keeps track of whether an xact was output at prepare
time or not and suppresses output at commit time.

I'm inclined to shelve the patch for Pg 10. We've only got a couple of days
left, the tests are still pretty minimal. We have open issues around
locking, less than totally satisfactory abort handling, and potential to
skip replay of transactions for both prepare and commit prepared. It's not
ready to go. However, it's definitely to the point where with a little more
work it'll be practical to patch into variants of Pg until we can
mainstream it in Pg 11, which is nice.

--
Craig Ringer

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-03-29 00:52:00 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Tsunakawa, Takayuki 2017-03-29 00:14:03 Re: Supporting huge pages on Windows