Re: [HACKERS] logical decoding of two-phase transactions

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2017-11-24 06:41:22
Message-ID: CAMsr+YH_x7Lrz3arUewzAk4=FZ-+hdum-eW7gZJBAEmzo937eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24 November 2017 at 13:44, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
wrote:

>
> > This could create an inconsistent view of the catalogs if our prepared
> txn
> > did any DDL. For example, we might've updated a pg_class row, so we
> created
> > a new row and set xmax on the old one. Vacuum may merrily remove our new
> row
> > so there's no way we can find the correct data anymore, we'd have to find
> > the outdated row or no row. By my reading of HeapTupleSatisfiesMVCC we'll
> > see the old pg_class tuple.
> >
> > Similar issues apply for pg_attribute etc etc. We might try to decode a
> > record according to the wrong table structure because relcache lookups
> > performed by the plugin will report outdated information.
> >
>
> We actually do the decoding in a PG_TRY/CATCH block, so if there are
> any errors we
> can clean those up in the CATCH block. If it's a prepared transaction
> then we can send
> an ABORT to the remote side to clean itself up.
>

Yeah. I suspect it might not always ERROR gracefully though.

> > How practical is adding a lock class?
>
> Am open to suggestions. This looks like it could work decently.

It looks amazingly simple from here. Which probably means there's more to
it that I haven't seen yet. I could use advice from someone who knows the
locking subsystem better.

> Yes, that makes sense in case of ROLLBACK. If we find a missing GID
> for a COMMIT PREPARE we are in for some trouble.
>

I agree. But it's really down to the apply worker / plugin to set policy
there, I think. It's not the 2PC decoding support's problem.

I'd argue that a plugin that wishes to strictly track and match 2PC aborts
with the subsequent ROLLBACK PREPARED could do so by recording the abort
locally. It need not rely on faked-up 2pc xacts from the output plugin.
Though it might choose to create them on the downstream as its method of
tracking aborts.

In other words, we don't need the logical decoding infrastructure's help
here. It doesn't have to fake up 2PC xacts for us. Output plugins/apply
workers that want to can do it themselves, and those that don't can ignore
rollback prepared for non-matched GIDs instead.

> We'd need a race-free way to do that though, so I think we'd have to
> extend
> > FinishPreparedTransaction and LockGXact with some kind of missing_ok
> flag. I
> > doubt that'd be controversial.
> >
>
> Sure.

I reckon that should be in-scope for this patch, and pretty clearly useful.
Also simple.

>
> > - It's really important that the hook that decides whether to decode an
> xact
> > at prepare or commit prepared time reports the same answer each and every
> > time, including if it's called after a prepared txn has committed. It
> > probably can't look at anything more than the xact's origin replica
> > identity, xid, and gid. This also means we need to know the gid of
> prepared
> > txns when processing their commit record, so we can tell logical decoding
> > whether we already sent the data to the client at prepare-transaction
> time,
> > or if we should send it at commit-prepared time instead.
> >
>
> We already have a filter_prepare_cb hook in place for this. TBH, I
> don't think this patch needs to worry about the internals of that
> hook. Whatever it returns, if it returns the same value everytime then
> we should be good from the logical decoding perspective.
>

I agree. I meant that it should try to pass only info that's accessible at
both PREPARE TRANSACTION and COMMIT PREPARED time, and we should document
the importance of returning a consistent result. In particular, it's always
wrong to examine the current twophase state when deciding what to return.

> I think, if we encode the logic in the GID itself, then it will be
> good and consistent everytime. For example, if the hook sees a GID
> with the prefix '_$Logical_', then it knows it has to PREPARE it.
> Others can be decoded at commit time.
>

Yep. We can also safely tell the hook:

* the xid
* whether the xact has made catalog changes (since we know that at prepare
and commit time)

but probably not much else.

> > - You need to flush the syscache when you finish decoding a PREPARE
> > TRANSACTION of an xact that made catalog changes, unless it's immediately
> > followed by COMMIT PREPARED of the same xid. Because xacts between the
> two
> > cannot see changes the prepared xact made to the catalogs.
> >
> > - For the same reason we need to ensure that the historic snapshot used
> to
> > decode a 2PC xact that made catalog changes isn't then used for
> subsequent
> > xacts between the prepare and commit. They'd see the uncommitted
> catalogs of
> > the prepared xact.
> >
>
> Yes, we will do TeardownHistoricSnapshot and syscache flush as part of
> the cleanup for 2PC transactions.
>

Great.

Thanks.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 高增琦 2017-11-24 07:17:24 Re: How is the PostgreSQL debuginfo file generated
Previous Message Nikhil Sontakke 2017-11-24 05:44:52 Re: [HACKERS] logical decoding of two-phase transactions