Re: [HACKERS] logical decoding of two-phase transactions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-07-20 14:58:36
Message-ID: 20180720145836.gxwhbftuoyx5h4gc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-07-20 12:13:19 +0530, Nikhil Sontakke wrote:
> Hi Andres,
>
>
> > So what if we, at the begin / end of cache miss handling, re-check if
> > the to-be-decoded transaction is still in-progress (or has
> > committed). And we throw an error if that happened. That error is then
> > caught in reorderbuffer, the in-progress-xact aborted callback is
> > called, and processing continues (there's a couple nontrivial details
> > here, but it should be doable).
> >
> > The biggest issue is what constitutes a "cache miss". It's fairly
> > trivial to do this for syscache / relcache, but that's not sufficient:
> > there's plenty cases where catalogs are accessed without going through
> > either. But as far as I can tell if we declared that all historic
> > accesses have to go through systable_beginscan* - which'd imo not be a
> > crazy restriction - we could put the checks at that layer.
> >
>
> Documenting that historic accesses go through systable_* APIs does
> seem reasonable. In our earlier discussions, we felt asking plugin
> writers to do anything along these lines was too onerous and
> cumbersome to expect.

But they don't really need to do that - in just about all cases access
"automatically" goes through systable_* or layers above. If you call
output functions, do syscache lookups, etc you're good.

> > That'd require that an index lookup can't crash if the corresponding
> > heap entry doesn't exist (etc), but that's something we need to handle
> > anyway. The issue that multiple separate catalog lookups need to be
> > coherent (say Robert's pg_class exists, but pg_attribute doesn't
> > example) is solved by virtue of the the pg_attribute lookups failing if
> > the transaction aborted.
> >
> > Am I missing something here?
> >
>
> Are you suggesting we have a:
>
> PG_TRY()
> {
> Catalog_Access();
> }
> PG_CATCH()
> {
> Abort_Handling();
> }
>
> here?

Not quite, no. Basically, in a simplified manner, the logical decoding
loop is like:

while (true)
record = readRecord()
logical = decodeRecord()

PG_TRY():
StartTransactionCommand();

switch (TypeOf(logical))
case INSERT:
insert_callback(logical);
break;
...

CommitTransactionCommand();

PG_CATCH():
AbortCurrentTransaction();
PG_RE_THROW();

what I'm proposing is that that various catalog access functions throw a
new class of error, something like "decoding aborted transactions". The
PG_CATCH() above would then not unconditionally re-throw, but set a flag
and continue iff that class of error was detected.

while (true)
if (in_progress_xact_abort_pending)
StartTransactionCommand();
in_progress_xact_abort_callback(made_up_record);
in_progress_xact_abort_pending = false;
CommitTransactionCommand();

record = readRecord()
logical = decodeRecord()

PG_TRY():
StartTransactionCommand();

switch (TypeOf(logical))
case INSERT:
insert_callback(logical);
break;
...

CommitTransactionCommand();

PG_CATCH():
AbortCurrentTransaction();
if (errclass == DECODING_ABORTED_XACT)
in_progress_xact_abort_pending = true;
continue;
else
PG_RE_THROW();

Now obviously that's just pseudo code with lotsa things missing, but I
think the basic idea should come through?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-20 15:03:23 Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Previous Message David Rowley 2018-07-20 14:57:16 Re: Making "COPY partitioned_table FROM" faster