Re: [HACKERS] logical decoding of two-phase transactions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-07-16 16:15:46
Message-ID: CA+TgmoZfuGSRBv=LDK2mk4Q1MB9wjMuL3AQfXAax2orDWerFUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 16, 2018 at 11:21 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Overall, I think it's clear the main risk associated with this patch is the
> decode group code - it touches PROC entries, so a bug may cause trouble
> pretty easily. So I've focused on this part, for now.

I agree. As a general statement, I think the idea of trying to
prevent transactions from aborting is really scary. It's almost an
axiom of the system that we're always allowed to abort, and I think
there could be a lot of unintended and difficult-to-fix consequences
of undermining that guarantee. I think it will be very difficult to
create a sound system for delaying transactions, and I doubt very much
that the proposed system is sound.

In particular:

- The do_wait loop contains a CHECK_FOR_INTERRUPTS(). If that makes
it interruptible, then it's possible for the abort to complete before
the decoding processes have aborted. If that can happen, then this
whole mechanism is completely pointless, because it fails to actually
achieve the guarantee which is its central goal. On the other hand,
if you don't make this abort interruptible, then you are significantly
increase the risk that a backend could get stuck in the abort path for
an unbounded period of time. If the aborting backend holds any
significant resources at this point, such as heavyweight locks, then
you risk creating a deadlock that cannot be broken until the decoding
process manages to abort, and if that process is involved in the
deadlock, then you risk creating an unbreakable deadlock.

- BackendXidGetProc() seems to be called in multiple places without
any lock held. I don't see how that can be safe, because AFAICS it
must inevitably introduce a race condition: the answer can change
after that value is returned but before it is used. There's a bunch
of recheck logic that looks like it is trying to cope with this
problem, but I'm not sure it's very solid. For example,
AssignDecodeGroupLeader reads proc->decodeGroupLeader without holding
any lock; we have historically avoided assuming that pointer-width
reads cannot be torn. (We have assumed this only for 4-byte reads or
narrower.) There are no comments about the locking hazards here, and
no real explanation of how the recheck algorithm tries to patch things
up:

+ leader = BackendXidGetProc(xid);
+ if (!leader || leader != proc)
+ {
+ LWLockRelease(leader_lwlock);
+ return NULL;
+ }

Can be non-NULL yet unequal to proc? I don't understand how that can
happen: surely once the PGPROC that has that XID aborts, the same XID
can't possibly be assigned to a different PGPROC.

- The code for releasing PGPROCs in ProcKill looks completely unsafe
to me. With locking groups for parallel query, a process always
enters a lock group of its own volition. It can safely use
(MyProc->lockGroupLeader != NULL) as a race-free test because no other
process can modify that value. But in this implementation of decoding
groups, one process can put another process into a decoding group,
which means this test has a race condition. If there's some reason
this is safe, the comments sure don't explain it.

I don't want to overplay my hand, but I think this code is a very long
way from being committable, and I am concerned that the fundamental
approach of blocking transaction aborts may be unsalvageably broken or
at least exceedingly dangerous.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-07-16 17:13:36 Another usability issue with our TAP tests
Previous Message David Fetter 2018-07-16 15:59:34 Re: Make foo=null a warning by default.