Re: [HACKERS] logical decoding of two-phase transactions

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-03-30 17:27:18
Message-ID: f92d74fe-0e55-4c9b-a342-4c134591f1be@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30/03/18 09:56, Nikhil Sontakke wrote:
>
>> I as wondering how to hide this. Best idea I had so far would be to put
>> it in heap_beginscan (and index_beginscan given that catalog scans use
>> is as well) behind some condition. That would also improve performance
>> because locking would not need to happen for syscache hits. The problem
>> is however how to inform the heap_beginscan about the fact that we are
>> in 2PC decoding. We definitely don't want to change all the scan apis
>> for this. I wonder if we could add some kind of property to Snapshot
>> which would indicate this fact - logical decoding is using it's own
>> snapshots it could inject the information about being inside the 2PC
>> decoding.
>>
>
> The idea of adding that info in the Snapshot itself is interesting. We
> could introduce a logicalxid field in SnapshotData to point to the XID
> that the decoding backend is interested in. This could be added only
> for the 2PC case. Support in the future for in-progress transactions
> could use this field as well. If it's a valid XID, we could call
> LogicalLockTransaction/LogicalUnlockTransaction on that XID from
> heap_beginscan/head_endscan respectively. I can also look at what
> other *_beginscan APIs would need this as well.
>

So I have spent some significant time today thinking about this (the
issue in general not this specific idea). And I think this proposal does
not work either.

The problem is that we fundamentally want two things, not one. It's true
we want to block ABORT from finishing while we are reading catalogs, but
the other important part is that we want to bail gracefully when ABORT
happened for the transaction being decoded.

In other words,, if we do the locking transparently somewhere in the
scan or catalog read or similar there is no way to let the plugin know
that it should bail. So the locking code that's called from several
layers deep would have only one option, to ERROR. I don't think we want
to throw ERRORs when transaction which is being decoded has been aborted
as that disrupts the replication.

I think that we basically only have two options here that can satisfy
both blocking ABORT and bailing gracefully in case ABORT has happened.
Either the plugin has full control over locking (as in the patch), so
that it can bail when the locking function reports that the transaction
has aborted. Or we do the locking around the plugin calls, ie directly
in logical decoding callback wrappers or similar.

Both of these options have some disadvantages. Locking inside plugin
make the plugin code much more complex if it wants to support this. For
example if I as plugin author call any function that somewhere access
syscache, I have to do the locking around that function call. Locking
around plugin callbacks can hold he lock for longer periods of time
since plugins usually end up writing to network. I think for most
use-cases of 2PC decoding the latter is more useful as plugin should be
connected to some kind transaction management solution. Also the time
should be bounded by things like wal_sender_timeout (or
statement_timeout for SQL variant of decoding).

Note that I was initially advocating against locking around whole
callbacks when Nikhil originally came up with the idea, but after we
went over several other options here and given it a lot of thought I now
think it's probably least bad way we have available. At least until
somebody figures out how to solve all the issues around reading aborted
catalog changes, but that does seem like rather large project on its
own. And if we do locking around plugin callbacks now then we can easily
switch to that solution if it ever happens without anybody having to
rewrite the plugins.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-30 17:36:27 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message David Steele 2018-03-30 17:27:11 Re: PATCH: Configurable file mode mask