Re: Two proposed modifications to the PostgreSQL FDW

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Two proposed modifications to the PostgreSQL FDW
Date: 2018-08-20 16:46:32
Message-ID: CAN-RpxA-hAA5Gwo8dWG9JCeJNH6+ib8G8fpn8Y55nCiEEb_P1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> > 1. INSERTMETHOD=[insert|copy] option on foreign table.
> >
> > One significant limitation of the PostgreSQL FDW is that it does a
> prepared
> > statement insert on each row written which imposes a per-row latency.
> This
> > hits environments where there is significant latency or few latency
> > guarantees particularly hard, for example, writing to a foreign table
> that
> > might be physically located on another continent. The idea is that
> > INSERTMETHOD would default to insert and therefore have no changes but
> > where needed people could specify COPY which would stream the data out.
> > Updates would still be unaffected.
>
> That has a *lot* of semantics issues, because you suddenly don't get
> synchronous error reports anymore. I don't think that's OK on a
> per-table basis. If we invented something like this, it IMO should be a
> per-statement explicit opt in that'd allow streaming.
>

I want to push back a bit and justify the per table decision here.

This is primarily useful when two things are present:
1. There is significant lag between the servers
2. There are significant bulk operations on the table.

Only when both are present does it make sense to use this option. If you
have a low latency connection, don't use it. If you are only writing a few
rows here and there, don't use it. If you have a high latency connection
*and* you are inserting large numbers of rows at a time, then this is where
you run into a problem and the current FDW approach is, frankly, unusable
in these cases. These characteristics follow the server (and hence the
table) in the first case and the table only in the second case. Therefore
the characteristics of the foreign table determine whether it makes sense
to consider this option. Doing this on the statement level poses more
problems, in my view, than it fixes.

Moreover the idea of batching inserts doesn't help either (and actually
would be worse in many cases than the current approach) since the current
approach uses a prepared statement which gets called for each row. Doing
a batch insert might save you on the per-row round trip but it adds a great
deal of complexity to the overall operation.

Now, a per-statement opt-in poses a number of questions. Obviously we'd
have to allow this on local tables too, but it would have no effect. So
suddenly you have a semantic construct which *may* or *may not* pose the
problems you mention and in fact may or may not pose the problem on foreign
tables because whether the construct does anything depends on the FDW
driver. I don't see what exactly one buys in shifting the problem in this
way given that when this is useful it will be used on virtually all inserts
and when it is not useful it won't be used at all.

Now, the entire problem is the expectation of synchronous errors on a per
row basis. This is what causes the problem with inserting a million rows
over a transcontinental link. (insert row, check error, insert row, check
error etc).

So on to the question of containing the problems. I think the best way to
contain this is to have one COPY statement per insert planner node. And
that makes sure that the basic guarantee that a planner node succeeds or
errors is met. For sequences, I would expect that supplying a column list
without the relevant sequenced value should behave in a tolerable way
(otherwise two concurrent copies would have problems).

So that's the primary defense. Now let's look at the main alternatives.

1. You can use a table where you logically replicate inserts, and where
you insert then truncate or delete. This works fine until you have enough
WAL traffic that you cannot tolerate a replication outage.....
2. You can write an extension which allows you to COPY a current table
into a foreign table. The problem here is that SPI has a bunch of
protections to keep you from copying to stdout, so when I helped with this
proof fo concept we copied to a temp file in a ramdisk and copied from
there which is frankly quite a bit worse than this approach.

If you want a per statement opt-in, I guess the questions that come to my
mind are:

1. What would this opt-in look like?
2. How important is it that we avoid breaking current FDW interfaces to
implement it?
3. How would you expect it to behave if directed at a local table or a
foreign table on, say, a MySQL db?

>
>
> > 2. TWOPHASECOMMIT=[off|on] option
>
> > The second major issue that I see with PostgreSQL's foreign database
> > wrappers is the fact that there is no two phase commit which means that a
> > single transaction writing to a group of tables has no expectation that
> all
> > backends will commit or rollback together. With this patch an option
> would
> > be applied to foreign tables such that they could be set to use two phase
> > commit When this is done, the first write to each backend would
> register a
> > connection with a global transaction handler and a pre-commit and commit
> > hooks would be set up to properly process these.
> >
> > On recommit a per-global-transaction file would be opened in the data
> > directory and prepare statements logged to the file. On error, we simply
> > roll back our local transaction.
> >
> > On commit hook , we go through and start to commit the remote global
> > transactions. At this point we make a best effort but track whether or
> not
> > we were successfully on all. If successful on all, we delete the file.
> If
> > unsuccessful we fire a background worker which re-reads the file and is
> > responsible for cleanup. If global transactions persist, a SQL
> > administration function will be made available to restart the cleanup
> > process. On rollback, we do like commit but we roll back all
> transactions
> > in the set. The file has enough information to determine whether we
> should
> > be committing or rolling back on cleanup.
> >
> > I would like to push these both for Pg 12. Is there any feedback on the
> > concepts and the problems first
>
> There's been *substantial* work on this. You should at least read the
> discussion & coordinate with the relevant developers.
>

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping
with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you
mention and the disagreement was part of why I thought about going this
direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make no
assumptions about what might happen between prepare and an attempted local
commit.
prepare goes into the precommit hook
commit goes into the commit hook and we never raise errors if it fails
(because you cannot rollback at that point). Instead a warning is raised
and cleanup commences.
rollback goes into the rollback hook and we never raise errors if it
fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of a
foreign data wrapper or a server, we can better prevent prepared
transactions where they have not been enabled.
This also ensures that we know whether we are guaranteeing two phase
commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to
incorrect configuration etc since if the DBA says "use two phase commit"
and failed to enable prepared transactions on the other side...

On to failure modes:
1. Its possible that under high load too many foreign transactions are
prepared and things start rolling back instead of committing. Oh well....
2. In the event that a foreign server goes away between prepare and
commit, we continue to retry via the background worker. The background
worker is very pessimistic and checks every remote system for the named
transaction.
3. Administrator action is required in two specific cases. In the first
case the controlling server crashes, restarts, etc along with the cleanup
worker while cleanup is running. It would be possible to patch the backend
to provide a cleanup service of this sort on startup. However this would
require some design decisions to be made now. In the second case, a
transaction has been around for a while and is causing vacuum horizon
issues so needs to be manually resolved. Maybe the reason here is beyond
our control such as another recommit hook stalling in another extension.

Part of the reason for this discussion is to try to get feedback not only
from the current developers but also those (like Tom) who objected the last
time around.

For example, if it is a requirement that we have automatic cleanup in the
event of a server restart, then the backend needs to know what to do about
these. If not, then we leave the whole code in the FDW. If the backend
has to know we probably need a file header sufficient for the backend to
decide that this is a PostgreSQL-FDW global transaction log vs maybe one
that could be added for the Oracle FDW later.... Maybe mentioning the .so
and symbol for the background worker entry point?

> See https://commitfest.postgresql.org/19/1574/ and the referenced
> discussions.
>
> Greetings,
>
> Andres Freund
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-08-20 16:58:49 Re: A really subtle lexer bug
Previous Message Tom Lane 2018-08-20 15:45:12 Re: Truncation failure in autovacuum results in data corruption (duplicate keys)