Re: Two proposed modifications to the PostgreSQL FDW

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Two proposed modifications to the PostgreSQL FDW
Date: 2018-08-22 08:13:11
Message-ID: CAN-RpxBAjsLc1=6JVyBU5XXuEC7gCmtTG1b5S6Mn=VJqEiwWgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 22, 2018 at 9:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Wed, Aug 22, 2018 at 1:20 PM Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
> > The two things I would suggest is that rather than auto-detecting (if I
> understand your patch correctly) whether prepared transactions are possible
> on the other system, making it an option to the foreign server or foreign
> table. Otherwise one might enable prepared transactions for one set of
> operations on one database and have it automatically cause headaches in
> another context.
>
> Yeah, currently it's an option for foreign servers. The patch adds a
> new option "two_phase_commit" to postgres_fdw.
>

Seems sane.

>
> >
> > The other thing I wasn't quite sure about on your patch was what happens
> if, say, someone trips over a power cord while the background worker is
> trying to commit things, whether the information is available on the
> initiating server when it comes back. whether a DBA has to go out and try
> to figure out what needs to be committed remotely, and how this would be
> done. If you could explain that process, that would be helpful to me.
> >
> > (In my approach these would be recorded on the master and an SQL
> function could re-initiate the background worker.)
>
> My approach is almost the same as yours. For details, in the
> pre-commit we do WAL-logging for each participants server before
> preparing transactions on the remote sides. The WAL has information of
> participants foreign servers(foreign server oid, database oid etc) and
> its global transaction identifier. Even if plug-pulled during trying
> to commit we can recover the global transactions that are not
> completed yet and its participants information from WAL. After the
> recovery users needs to execute the SQL function to fix the
> in-completed global transactions. Since the function can find out
> whether the remote transaction should be committed or rollback-ed by
> checking CLOG. Does my answer make sense?
>

Yeah. That is probably more elegant than my solution. I do wonder though
if the next phase would not be to add some sort of hook to automatically
start the background worker in this case.

>
> >>
> >>
> >> >
> >> > Moreover since COMMIT PREPARED occurs during the commit hook, not the
> precommit hook, it is too late to roll back the local transaction. We
> cannot raise errors since this causes a conflict in the commit status of
> the local transaction. So when we commit the local transaction we commit
> to committing all prepared transactions as soon as possible. Note some
> changes need to be made to make this usable in the FDW context, so what I
> am hoping is that the dialog helps impact the discussion and options going
> forward.
> >> >
> >> >>
> >> >> Also
> >> >> since we don't want to wait for COMMIT PREPARED to complete we need
> to
> >> >> consider that users could cancel the query anytime. To not break the
> >> >> current semantics we cannot raise error during 2nd phase of two-phase
> >> >> commit but it's not realistic because even the palloc() can raise an
> >> >> error.
> >> >
> >> >
> >> > We don't palloc. All memory used here is on the stack. I do allow
> for dramatic precondition checks to cause errors but those should never
> happen absent some other programmer doing something dramatically unsafe
> anyway. For example if you try to double-commit a transaction set.....
> >>
> >> Sorry, palloc() is just an example. I'm not sure all FDWs can
> >> implement all callbacks for two-phase commit without codes that could
> >> emit errors.
> >
> >
> > Yeah, but if you are in the commit hook and someone emits an error,
> that's wrong because that then tries to rollback an already committed
> transaction and the backend rightfully panics. In fact I should probably
> strip out the precondition check errors there and issue a warning. It
> might sometimes happen when something goes seriously wrong on a system
> level, but....
>
> In my patch since the commit hook is performed by the background
> worker not by the backends it's no problem if someone emits an error
> in the commit hook. After the backend prepared transactions on the all
> remote side, it enqueue itself to the wait queue. The background
> worker gets the global transaction waiting to be completed and commit
> prepared transaction on all remote side. After completed the global
> transaction the background worker dequeue it.
>

Seems sane. I was firing off one background worker per global transaction
that needed cleanup. This might be an area to think about in terms of
questions of larger parallelism in remote writes.

>
> >>
> >>
> >> >
> >> > There is a possible of system errors if one can no longer write to
> the file log but at this point as long as we have logged the phase change
> to commit we are able to recover later.
> >> >
> >> > So in the event where one sees an error here one continues on to the
> next transaction in the global transaction set and tries to commit it, etc.
> until it runs through the entire set of prepared transactions. Then if
> there were any errors it fires off a background worker which re-reads the
> log file and goes out to the various foreign servers, checks to see if
> there is a prepared transaction, and if so commits it. If the transaction
> set state was in rollback, it tries to roll it back instead. If this
> errors,, it sleeps for a second and then loops through those which errored
> and retries until all are complete.
> >>
> >> Yeah, the patch has the similar functionality.
> >>
> >> > The other thing is we record whether we are committing or rolling
> back the transaction when we hit the commit or rollback hook. This is
> critical because we can imagine a world where the Oracle FDW supports
> similar semantics. In that case everything works and is not ordering
> dependent. I.e. we can prepare our transactions. Oracle can try and fail,
> and rollback, and we rollback all the transactions everywhere. And all we
> have to know was we got to the precommit hook and then we rolled back.
> >>
> >> In my patch the global transaction manager manages each status of
> >> foreign servers participating in global transactions with WAL logging.
> >> The fate of transaction on foreign server will be determined according
> >> to the state of local transaction and their status. WAL logging is
> >> important because not only in term of speedup but also supporting
> >> streaming replication.
> >
> >
> > So you are optimizing for large numbers of prepared transactions or at a
> high rate?
>
> I don't do optimizations much for the patch as this is the first
> implementation. Once the basic feature committed I will do that.
>
> >
> > Also does the background worker get fired again on recovery as needed?
>
> No. I added new SQL function to fix global transactions. We need to
> execute that function manually after recovery.
>

And that is solely on a case where the db does a full restart. Is it
possible the background worker could survive a backend restart?

>
> >>
> >>
> >> >>
> >> >> The design the patch chose is making backends do only PREPARE and
> wait
> >> >> for the background worker to complete COMMIT PREPARED. In this design
> >> >> the clients get a "committed" only either when successful in commit
> on
> >> >> all participants or when they cancel the query explicitly. In other
> >> >> words, the client will wait for completion of 2nd phase of two-phase
> >> >> commit forever unless it cancels.
> >> >
> >> >
> >> > In this approach we make a best effort to commit or rollback (as
> appropriate in the state of the global transaction) *all* remote
> transactions during global commit or global rollback. It is not guaranteed
> but it avoids breaking semantics as much as we can. Also the background
> worker here does not need to attach to shared memory since the log has
> everything required. COMMIT PREPARED ought to be a fast operation unless
> there are network problems but those can affect prepare as well.
> >> >
> >> > Also imagine a case where you are writing to three dbs. One is on
> Oracle, one on DB2, and one on PostgreSQL You successfully prepare your
> transaction. DB2 successfully prepares, and then the Oracle db errors for
> some reason (maybe a deferred constraint). Does the background worker have
> enough information to know to roll back your transaction on the remote side?
> >>
> >> I think that what the background worker needs to know to rollback
> >> remote transactions are how to rollback and what to rollback. How to
> >> rollback is defined in each FDWs.
> >
> >
> > Agreed. And naturally same with commits.
> >
> > My assumption is that each foreign data wrapper would have to set its
> own precommit/commit hook callbacks. I think your patch extends the fdw
> structure to try to ensure these are done automatically?
>
> Yes. The patch adds new FDW APIs for the atomic commit such as
> prepare, commit, rollback, resolve(2nd phase of 2PC). The FDW
> developers who want make their FDW support the atomic commit need to
> define these API and call the registration function when transaction
> starts. If the FDW of the registered foreign server doesn't support
> FDW's atomic commit API the transaction emit an error.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-08-22 08:15:18 Re: JIT compiling with LLVM v12
Previous Message Michael Paquier 2018-08-22 08:11:26 Re: BUG #15346: Replica fails to start after the crash