Re: Two proposed modifications to the PostgreSQL FDW

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Two proposed modifications to the PostgreSQL FDW
Date: 2018-08-21 08:35:38
Message-ID: CAN-RpxCZLcsMRgZD+TmstVgRiU7F+YBVhc4o-S3RCgqO03Rqsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
> >
> >
> >
> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >>
> >> Hi,
> >>
> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> >> > 2. TWOPHASECOMMIT=[off|on] option
> >>
> >> > The second major issue that I see with PostgreSQL's foreign database
> >> > wrappers is the fact that there is no two phase commit which means
> that a
> >> > single transaction writing to a group of tables has no expectation
> that all
> >> > backends will commit or rollback together. With this patch an option
> would
> >> > be applied to foreign tables such that they could be set to use two
> phase
> >> > commit When this is done, the first write to each backend would
> register a
> >> > connection with a global transaction handler and a pre-commit and
> commit
> >> > hooks would be set up to properly process these.
> >> >
> >> > On recommit a per-global-transaction file would be opened in the data
> >> > directory and prepare statements logged to the file. On error, we
> simply
> >> > roll back our local transaction.
> >> >
> >> > On commit hook , we go through and start to commit the remote global
> >> > transactions. At this point we make a best effort but track whether
> or not
> >> > we were successfully on all. If successful on all, we delete the
> file. If
> >> > unsuccessful we fire a background worker which re-reads the file and
> is
> >> > responsible for cleanup. If global transactions persist, a SQL
> >> > administration function will be made available to restart the cleanup
> >> > process. On rollback, we do like commit but we roll back all
> transactions
> >> > in the set. The file has enough information to determine whether we
> should
> >> > be committing or rolling back on cleanup.
> >> >
> >> > I would like to push these both for Pg 12. Is there any feedback on
> the
> >> > concepts and the problems first
> >>
>
> Thank you for the proposal. I agree that it's a major problem that
> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
> commit.
>
> >> There's been *substantial* work on this. You should at least read the
> >> discussion & coordinate with the relevant developers.
> >
> >
> > I suppose I should forward this to them directly also.
> >
> > Yeah. Also the transaction manager code for this I wrote while helping
> with a proof of concept for this copy-to-remote extension.
> >
> > There are a few big differences in implementation with the patches you
> mention and the disagreement was part of why I thought about going this
> direction.
> >
> > First, discussion of differences in implementation:
> >
> > 1. I treat the local and remote transactions symmetrically and I make
> no assumptions about what might happen between prepare and an attempted
> local commit.
> > prepare goes into the precommit hook
> > commit goes into the commit hook and we never raise errors if it
> fails (because you cannot rollback at that point). Instead a warning is
> raised and cleanup commences.
> > rollback goes into the rollback hook and we never raise errors if it
> fails (because you are already rolling back).
> >
> > 2. By treating this as a property of a table rather than a property of
> a foreign data wrapper or a server, we can better prevent prepared
> transactions where they have not been enabled.
> > This also ensures that we know whether we are guaranteeing two phase
> commit or not by looking at the table.
> >
> > 3. By making this opt-in it avoids a lot of problems with regards to
> incorrect configuration etc since if the DBA says "use two phase commit"
> and failed to enable prepared transactions on the other side...
> >
> > On to failure modes:
> > 1. Its possible that under high load too many foreign transactions are
> prepared and things start rolling back instead of committing. Oh well....
> > 2. In the event that a foreign server goes away between prepare and
> commit, we continue to retry via the background worker. The background
> worker is very pessimistic and checks every remote system for the named
> transaction.
>
> If some participant servers fail during COMMIT PREPARED, will the
> client get a "committed"? or an "aborted"? If the client gets
> "aborted", that's not correct because the local changes are already
> committed at that point.

Ok so let's discuss this in more detail here.

You have basically 6 states a TPC global transaction can be in.
1. We haven't gotten to the point of trying to commit (BEGIN)
2. We are trying to commit (PREPARE)
3. We have committed to committing all transactions (COMMIT)
4. We have committed to rolling back all transactions (ROLLBACK)
5. We have successfully committed OR rolled back all transactions
(COMPLETE)
6. We tried to commit or rollback all transactions and got some errors
(INCOMPLETE)

During COMMIT PREPARED we cannot raise errors to PostgreSQL. We have
already committed to committing and therefore the only way forward is to
fix the problem.

> On the other hand, if the client get
> "committed" it might break the current user semantics because the
> subsequent reads may not be able to see the own committed writes.

Actually it is worse than that and this is why automatic attempted recovery
is an absolute requirement. If you cannot commit prepared, then you have a
prepared statement that is stuck on the remote side. This sets auto vacuum
horizons and some other nastiness. So we have to note, move on, and try to
fix.

Moreover since COMMIT PREPARED occurs during the commit hook, not the
precommit hook, it is too late to roll back the local transaction. We
cannot raise errors since this causes a conflict in the commit status of
the local transaction. So when we commit the local transaction we commit
to committing all prepared transactions as soon as possible. Note some
changes need to be made to make this usable in the FDW context, so what I
am hoping is that the dialog helps impact the discussion and options going
forward.

> Also
> since we don't want to wait for COMMIT PREPARED to complete we need to
> consider that users could cancel the query anytime. To not break the
> current semantics we cannot raise error during 2nd phase of two-phase
> commit but it's not realistic because even the palloc() can raise an
> error.
>

We don't palloc. All memory used here is on the stack. I do allow for
dramatic precondition checks to cause errors but those should never happen
absent some other programmer doing something dramatically unsafe anyway.
For example if you try to double-commit a transaction set.....

There is a possible of system errors if one can no longer write to the file
log but at this point as long as we have logged the phase change to commit
we are able to recover later.

So in the event where one sees an error here one continues on to the next
transaction in the global transaction set and tries to commit it, etc.
until it runs through the entire set of prepared transactions. Then if
there were any errors it fires off a background worker which re-reads the
log file and goes out to the various foreign servers, checks to see if
there is a prepared transaction, and if so commits it. If the transaction
set state was in rollback, it tries to roll it back instead. If this
errors,, it sleeps for a second and then loops through those which errored
and retries until all are complete.

The other thing is we record whether we are committing or rolling back the
transaction when we hit the commit or rollback hook. This is critical
because we can imagine a world where the Oracle FDW supports similar
semantics. In that case everything works and is not ordering dependent.
I.e. we can prepare our transactions. Oracle can try and fail, and
rollback, and we rollback all the transactions everywhere. And all we have
to know was we got to the precommit hook and then we rolled back.

>
> The design the patch chose is making backends do only PREPARE and wait
> for the background worker to complete COMMIT PREPARED. In this design
> the clients get a "committed" only either when successful in commit on
> all participants or when they cancel the query explicitly. In other
> words, the client will wait for completion of 2nd phase of two-phase
> commit forever unless it cancels.
>

In this approach we make a best effort to commit or rollback (as
appropriate in the state of the global transaction) *all* remote
transactions during global commit or global rollback. It is not guaranteed
but it avoids breaking semantics as much as we can. Also the background
worker here does not need to attach to shared memory since the log has
everything required. COMMIT PREPARED ought to be a fast operation unless
there are network problems but those can affect prepare as well.

Also imagine a case where you are writing to three dbs. One is on Oracle,
one on DB2, and one on PostgreSQL You successfully prepare your
transaction. DB2 successfully prepares, and then the Oracle db errors for
some reason (maybe a deferred constraint). Does the background worker have
enough information to know to roll back your transaction on the remote side?

>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Travers 2018-08-21 08:49:29 Re: Two proposed modifications to the PostgreSQL FDW
Previous Message Kyotaro HORIGUCHI 2018-08-21 08:17:31 Re: A typo in guc.c