Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
Date: 2018-01-23 19:17:08
Message-ID: 5a86f4bb-2e65-b09e-3c04-5b1098574319@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23/01/18 16:01, Nikhil Sontakke wrote:
>> I must be missing something in this discussion, cos I don't see any
>> problems with this "other option".
>>
>> Surely we prepare the 2PCxact and then it is persistent. Yes,
>> potentially for an unbounded period of time. And it is (usually) up to
>> the XA manager to resolve that. 2PC interacts with transaction
>> management and yes, it can be slow. But the choice is slow and
>> consistent, or not. This would only be used with the full choice of
>> the user, just like synchronous_commit.

It's not about transaction being persistent but the abort command being
blocked.

>>
>> In this case, we call the decoding plugin's precommit hook which would
>> then prepare the 2PCxact and set a non-persistent flag saying it is
>> being decoded. If decoding completes normally we release the lock and
>> commit. If decoding fails or the DBA has another reason to do so, we
>> provide a function that allows the flag to be unlocked. While it is
>> locked the 2PCxact cannot be aborted or committed.

Output plugin can't deal with precommit, that has to be handled
elsewhere but in principle this is true.

>>
>> There is no danger of accidental abort because the prepare has persistent state.
>
> This concurrent abort handling while decoding is ongoing is turning
> out to be complex affair.
>
> Thinking more about this, just to provide an example, we have a
> decoding plugin hook to determine if a GID for a 2PC was decoded at
> PREPARE time or COMMIT time as part of the 2PC logical decoding patch.
> We need that to determine the *same* static answer every time we see a
> specific GID while decoding across restarts; the plugin should know
> what it had done the last time around and should tell us the same
> later as well. It just occurred to me that as Simon also mentioned, it
> could/should also be the decoding plugin's responsibility to indicate
> if it's ok to go ahead with the abort of the transaction.
>
> So, we could consider adding a preAbort hook. That preAbort hook gets
> the GID, XID and other parameters as needed and tells us whether we
> can go ahead with the abort or if we need to wait out (maybe we pass
> in an ok_to_wait param as well). As an example, a plugin could lookup
> some shmem structure which points to the current transaction being
> decoded and does related processing to ensure that it stops decoding
> at a clean juncture, thus keeping the response time bounded to a
> maximum of one change record apply cycle. That passes the onus onto
> the plugin writers and keeps the core code around this concurrent
> abort handling clean.
>

Having this as responsibility of plugin sounds interesting. It certainly
narrows the scope for which we need to solve the abort issue. For 2PC
that may be okay as we need to somehow interact with transaction manager
as Simon noted. I am not sure if this helps streaming use-case though as
there is not going to be any external transaction management involved there.

In any case all this interlocking could potentially be made less
impact-full by only doing it when we know the transaction did catalog
changes prior to currently decoded change (which we do during decoding)
since that's the only time we are interested in if it aborted or not.

This all leads me to another idea. What if logical decoding provided API
for "locking/unlocking" the currently decoded transaction against abort.
This function would then be called by both decoding and output plugin
before any catalog read. The function can be smart enough to be NOOP if
the transaction is not running (ie we are not doing 2PC decoding or
streaming) or when the transaction didn't do any catalog modifications
(we already have that info easily accessible as bool).

That would mean we'd never do any kind of heavy locking for prolonged
periods of time (ie network calls) but only during catalog access and
only when needed. It would also solve this for both 2PC and streaming
and it would be easy to use by plugin authors. Just document that some
call should be done before catalog access when in output plugin, can
even be Asserted that the call was done probably.

Thoughts?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-23 19:24:56 Re: pgsql: Add parallel-aware hash joins.
Previous Message Peter Geoghegan 2018-01-23 19:11:43 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)