Re: COPY IN/BOTH vs. extended query mode

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY IN/BOTH vs. extended query mode
Date: 2021-12-16 03:03:11
Message-ID: dac86f9315450b12d419c602281fe682f2b24e84.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2017-01-23 at 21:12 -0500, Robert Haas wrote:
> According to the documentation for COPY IN mode, "If the COPY command
> was issued via an extended-query message, the backend will now
> discard
> frontend messages until a Sync message is received, then it will
> issue
> ReadyForQuery and return to normal processing." I added a similar
> note to the documentation for COPY BOTH mode in
> 91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation
> accurately describes the behavior of the server. However, this seems
> to make fully correct error handling for clients using libpq almost
> impossible, because PQsendQueryGuts() sends
> Parse-Bind-Describe-Execute-Sync in one shot without regard to
> whether
> the command that was just sent invoked COPY mode (cf. the note in
> CopyGetData about why we ignore Flush and Sync in that function).
>
> So imagine that the client uses libpq to send (via the extended query
> protocol) a COPY IN command (or some hypothetical command that starts
> COPY BOTH mode to begin). If the server throws an error before the
> Sync message is consumed, it will bounce back to PostgresMain which
> will set doing_extended_query_message = true after which it will
> consume messages, find the Sync, reset that flag, and send
> ReadyForQuery. On the other hand, if the server enters CopyBoth
> mode,
> consumes the Sync message in CopyGetData (or a similar function), and
> *then* throws an ERROR, the server will wait for a second Sync
> message
> from the client before issuing ReadyForQuery. There is no sensible
> way of coping with this problem in libpq, because there is no way for
> the client to know which part of the server code consumed the Sync
> message that it already sent. In short, from the client's point of
> view, if it enters COPY IN or COPY BOTH mode via the extend query
> protocol, and an error occurs on the server, the server MAY OR MAY
> NOT
> expect a further Sync message before issuing ReadyForQuery, and the
> client has no way of knowing -- except maybe waiting for a while to
> see what happens.

I investigated a bit deeper here, and I'm not sure this is a real
problem (aside from ambiguity in the protocol docs).

If you send "COPY ... FROM STDIN" using the extended query protocol in
libpq, the non-error message flow is something like:

-> Parse + Bind + Describe + Execute + Sync
[ server processes Parse + Bind + Describe + Execute ]
[ server enters copy-in mode ]
<- CopyInResponse
[ server ignores Sync ]
-> CopyData
[ server processes CopyData ]
-> CopyDone
[ server processes CopyDone ]
[ server exits copy-in mode ]
-> Sync
[ server processes Sync ]
<- ReadyForQuery

If an error happens before the server enters copy-in mode (e.g. syntax
error), then you get something like:

-> Parse + Bind + Describe + Execute + Sync
[ server processes
Parse, encounters error ]
<- ErrorResponse
[ server ignores Bind +
Describe + Execute ]
[ server processes Sync ]
<- ReadyForQuery
[
client never got CopyInResponse, so never sent copy messages ]

If an error happens after the CopyInResponse is sent (e.g. malformed
data), you get something like:

-> Parse + Bind + Describe + Execute + Sync
[ server processes Bind + Describe + Execute ]
[ server enters copy-in mode ]
<- CopyInResponse
[ server ignores Sync ]
-> CopyData
[ server processes CopyData, encounters error ]
[ server exits copy-in mode ]
<- ErrorResponse
-> CopyDone
[ server ignores CopyDone ]
-> Sync
[ server processes Sync ]
<- ReadyForQuery

If the backend is canceled after the server sends CopyInResponse but
before it consumes (and ignores) the Sync, you get something like:

-> Parse + Bind + Describe + Execute + Sync
[ server processes Bind + Describe + Execute ]
[ server enters copy-in mode ]
<- CopyInResponse
[ SIGINT, server encounters error ]
<- ErrorResponse
[ server exits copy-in mode ]
[ server processes Sync ]
<- ReadyForQuery
-> CopyData
[ server ignores CopyData ]
-> CopyDone
[ server ignores CopyDone ]
-> Sync
[ server processes Sync ]
<- ReadyForQuery

The last case is not great, because I could imagine it confusing a
client, but I'm not sure about exactly how, and maybe it's something we
can document around?

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-12-16 04:08:07 Re: pg_dump versus ancient server versions
Previous Message Thomas Munro 2021-12-16 03:00:41 Re: A micro-optimisation for ProcSendSignal()