COPY IN/BOTH vs. extended query mode

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: COPY IN/BOTH vs. extended query mode
Date: 2017-01-24 02:12:40
Message-ID: CA+Tgmoa4eA+cPXaiGQmEBp9XisVd3ZE9dbvnbZEvx9UcMiw2tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

According to the documentation for COPY IN mode, "If the COPY command
was issued via an extended-query message, the backend will now discard
frontend messages until a Sync message is received, then it will issue
ReadyForQuery and return to normal processing." I added a similar
note to the documentation for COPY BOTH mode in
91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation
accurately describes the behavior of the server. However, this seems
to make fully correct error handling for clients using libpq almost
impossible, because PQsendQueryGuts() sends
Parse-Bind-Describe-Execute-Sync in one shot without regard to whether
the command that was just sent invoked COPY mode (cf. the note in
CopyGetData about why we ignore Flush and Sync in that function).

So imagine that the client uses libpq to send (via the extended query
protocol) a COPY IN command (or some hypothetical command that starts
COPY BOTH mode to begin). If the server throws an error before the
Sync message is consumed, it will bounce back to PostgresMain which
will set doing_extended_query_message = true after which it will
consume messages, find the Sync, reset that flag, and send
ReadyForQuery. On the other hand, if the server enters CopyBoth mode,
consumes the Sync message in CopyGetData (or a similar function), and
*then* throws an ERROR, the server will wait for a second Sync message
from the client before issuing ReadyForQuery. There is no sensible
way of coping with this problem in libpq, because there is no way for
the client to know which part of the server code consumed the Sync
message that it already sent. In short, from the client's point of
view, if it enters COPY IN or COPY BOTH mode via the extend query
protocol, and an error occurs on the server, the server MAY OR MAY NOT
expect a further Sync message before issuing ReadyForQuery, and the
client has no way of knowing -- except maybe waiting for a while to
see what happens.

It does not appear to me that there is any good solution to this
problem. Fixing it on the server side would require a wire protocol
change - e.g. one kind of Sync message that is used in a
Parse-Bind-Describe-Execute-Sync sequence that only terminates
non-COPY commands and another kind that is used to signal the end even
of COPY. Fixing it on the client side would require all clients to
know prior to initiating an extended-query-protocol sequence whether
or not the command was going to initiate COPY, which is an awful API
even if didn't constitute an impossible-to-contemplate backward
compatibility break. Perhaps we will have to be content to document
the fact that this part of the protocol is depressingly broken...

...unless of course somebody can see something that I'm missing here
and the situation isn't as bad as it currently appears to me to be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-01-24 02:17:18 Re: Checksums by default?
Previous Message Tom Lane 2017-01-24 02:07:47 Re: Checksums by default?