Re: Feature suggestions: "dead letter"-savepoint.

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Terje Elde <terje(at)elde(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Feature suggestions: "dead letter"-savepoint.
Date: 2016-06-23 12:31:57
Message-ID: CAMsr+YGxvhfkAb8hnt=Jr8pubimz_98rTCZ2er=2CBVSKtug_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 June 2016 at 17:34, Terje Elde <terje(at)elde(dot)net> wrote:

>
> But what if there’s a bug making a call to the external service? Most of
> the time, you’ll trap the error and set status to something sane, but what
> if there’s a crash-bug in the SDK implementing it, or some other situation
> where things go very bad? The rocket might be fired, then the client dies,
> lock is released, another worker picks up the task, and repeats the process
> ad nausium.
>
> Okay, so you’ll just update the row to say you’re trying to send it. You
> set status=‘in-flight’ or some other status that’ll prevent the SELECT in
> other workers from picking up the task, and you commit that, so other
> workers won’t pick up the row if you die. In the process though, you also
> loose the lock on the row. You still want the row to be tied to you
> specifically, so you add a unique tag to the row, that later needs to be
> removed, so there’s more housekeeping.
>

It sounds like you're trying to re-invent distributed commit. Don't do
that. It's hard. Use 2PC and an external transaction co-ordinator that can
do in-doubt transaction resolution. On Java you want JTA. On Windows you
want MSDTC. In C you want to run screaming, um, I mean use XA.

> The basic idea is to be able to say “If I go belly up, I want this stuff
> to happen”.

This only papers over the problem rather weakly. What if the PostgreSQL
backend dies undexpectedly, not just the client? You've still got a
problem. Unless you're thinking of something that'd write to WAL then do
the work in some kind of critical section where if we fail we panic the
server and it gets done during WAL redo, or something like that.

> Depending on different needs, could be made persistent once the savepoint
> is taken, but for a lot of cases that wouldn’t really be needed.

It's starting to sound a lot like 2PC, you know.

> There’s some room for variation as well, such as having it support only
> dropped connections, or also support turning errors and/or rollbacks into
> rollback to and commits of the savepoint. Ideally configurable at the
> point the snapshot it taken, to easily support pr. snapshot variation.
>

Very like 2PC.

> I did for a few moments wonder if prepared transactions would be a better
> place for something like this. It could allow for named independent
> transactions, but there’s a fairly big mismatch between the two concepts.
> It also wouldn’t be too hard to use multiple named savepoints for
> effectively the same logic for most cases. One advantage of prepared
> transactions is that it could perhaps also cover the case of a postgresql
> child dying, but that’s not exactly a common problem. A huge dealbreaker
> though, is that the prepared transaction would very likely keep conflicting
> locks with the work to be done.
>

Actually, that's a benefit. It means you hold a lock on the row you're
working on until your coordinator determines whether the action was
actually performed or not, and commits or rolls back the 2PC prepared xact.
In the mean time nobody else tries to grab that same row and work on it.

Now, what I do think we need is to give the client the ability to determine
whether one of its xacts actually committed or not when it lost the session
after dispatching COMMIT but before getting a confirmation from the server
and persistently storing that knowledge. Right now if you want that you
have to do full 2PC. You shouldn't need to, you should be able to get the
xid when the xact is assigned it and store it somewhere locally. Then
later, if you're unsure if that xid committed or not due to a client crash
etc, you should be able to do some kind of SELECT pg_xact_is_committed(xid)
to find out. Right now this is possible to write with a pretty simple
extension, but adds an extra roundtrip for a SELECT txid_current() call
(unless you pipeline it). I'd prefer that the server just tell you when an
xid is assigned. And yes, I think xid is the right identifier for this;
it's short, simple, and while it wraps around it takes long enough to do so
that it's very well suited for this job.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message hari.prasath 2016-06-23 12:52:33 Extract Jsonb key and values
Previous Message Terje Elde 2016-06-23 10:48:20 Re: Feature suggestions: "dead letter"-savepoint.