Re: exactly what is COPY BOTH mode supposed to do in case of an error?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: exactly what is COPY BOTH mode supposed to do in case of an error?
Date: 2013-04-27 19:12:18
Message-ID: CA+TgmoY7Gwy0b2Cdc4rf_0_nF24roROMPdAmEiDsmZO+dcxW-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 27, 2013 at 6:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 27 April 2013 03:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> It seems the backend and libpq don't agree. The backend makes no
>> special provision to wait for a CopyDone message if an error occurs
>> during copy-both. It simply sends an ErrorResponse and that's it.
>> libpq, on the other hand, treats either CopyDone or ErrorResponse as a
>> cue to transition to PGASYNC_COPY_IN (see pqGetCopyData3).
>
> Well spotted, and good detective work.

Thanks.

>> I'm attaching a patch which adopts the position that the backend is
>> right and libpq is wrong. The opposite approach is also possible, but
>> I haven't tried to implement it. Or maybe there's a third way which
>> is better still.
>
> I guess if we assume this only affects replication we could change it
> at either end, not sure about that.
>
> libpq updates are much harder to roll out, so it would be better to
> assume that it is correct and the backend is wrong if we want to
> backpatch the fix.
>
> Not sure if that is a lot of work?

My feeling is that it would be better not to back-patch this, but just
fix it in master. Given the present uses of COPY-BOTH mode, the
problems seem to be limited to bad error messages, so it's arguably
not a critical bug fix. Also, I think that no matter which way we fix
it, people who upgrade the master to a new point release, but not
pg_receivexlog, would in some unlikely cases actually experience a
regression in the quality of error messages. I would say we have to
live with that if the consequences were any worse than bad error
messages in the first place, but as far as I can tell they're not. If
someone can contrive a scenario where this causes outright breakage,
that would tip the balance for me, but I don't at present see such a
hazard.

On a practical level, the main thing I didn't like about trying to fix
the server was the same issue that Tom mentioned: we'd need code in
the server to track whether COPY-BOTH mode is active and skip client
messages until we hit a CopyDone or CopyFail message. And I suspect
that code would be somewhat fragile, because having sent an
ErrorResponse already, we'd have no straightforward way to report a
further error - we'd need to report follow-on errors via NOTICE or
FATAL. Now this is not a disaster, but it's not great, either,
because there's a lot of code (including, notably, palloc) which
assumes that it can throw an ERROR whenever it likes. And in this
case, it couldn't.

The second thing I didn't like about that approach was that it would
make COPY-BOTH quite asymmetrical with both COPY-OUT and COPY-IN.
That didn't seem like a great idea, either.

A further point is that the problems in the back branches are less
serious anyway, because the timeline-switching code is the only thing
that ever tries to exit COPY-BOTH mode without closing the connection,
and that's new in 9.3.

So for all those reasons, my vote is for a client-side, master-only fix.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-04-27 19:23:17 Re: Remaining beta blockers
Previous Message Tom Lane 2013-04-27 18:24:17 Re: pg_ctl non-idempotent behavior change