Re: walsender termination error messages worse in v10

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: walsender termination error messages worse in v10
Date: 2017-06-02 20:57:37
Message-ID: 2d132a85-1043-ba5f-3e23-131eb279df14@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

On 02/06/17 20:51, Andres Freund wrote:
> Hi,
>
> commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
> Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
> Date: 2017-03-23 08:36:36 -0400
>
> Logical replication support for initial data copy
>
> made walreceiver emit worse messages in v10 than before when the master
> gets shut down. Before 10 you'll get something like:
>
> 2017-06-02 11:46:07.173 PDT [16143][] LOG: replication terminated by primary server
> 2017-06-02 11:46:07.173 PDT [16143][] DETAIL: End of WAL reached on timeline 1 at 0/1B7FB8C8.
> 2017-06-02 11:46:07.173 PDT [16143][] FATAL: could not send end-of-streaming message to primary: no COPY in progress
>
> the last bit is a bit superflous, but it still kinda makes sense. Now
> you get:
> 2017-06-02 11:47:09.362 PDT [17061][] FATAL: unexpected result after CommandComplete: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> the reason is that
>
> static int
> libpqrcv_receive(char **buffer, pgsocket *wait_fd)
> {
>
> previously did:
> if (rawlen == -1) /* end-of-streaming or error */
> {
> PGresult *res;
>
> res = PQgetResult(streamConn);
> if (PQresultStatus(res) == PGRES_COMMAND_OK ||
> PQresultStatus(res) == PGRES_COPY_IN)
> {
> PQclear(res);
> return -1;
> }
> else
> {
> PQclear(res);
> ereport(ERROR,
> (errmsg("could not receive data from WAL stream: %s",
> PQerrorMessage(streamConn))));
> }
> }
>
> and now does
>
> if (rawlen == -1) /* end-of-streaming or error */
> {
> PGresult *res;
>
> res = PQgetResult(conn->streamConn);
> if (PQresultStatus(res) == PGRES_COMMAND_OK)
> {
> PQclear(res);
>
> /* Verify that there are no more results */
> res = PQgetResult(conn->streamConn);
> if (res != NULL)
> ereport(ERROR,
> (errmsg("unexpected result after CommandComplete: %s",
> PQerrorMessage(conn->streamConn))));
> return -1;
> }
> else if (PQresultStatus(res) == PGRES_COPY_IN)
> {
> PQclear(res);
> return -1;
> }
> else
> {
> PQclear(res);
> ereport(ERROR,
> (errmsg("could not receive data from WAL stream: %s",
> pchomp(PQerrorMessage(conn->streamConn)))));
> }
> }
>
> note the new /* Verify that there are no more results */ bit.
>
> I don't understand why the new block is there, nor does the commit
> message explain it.
>

Hmm, that particular change can actually be reverted. It was needed for
one those custom replication commands which were replaced by normal
query support. I have missed it during the rewrite.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-06-02 21:05:44 Re: PG 10 release notes
Previous Message Petr Jelinek 2017-06-02 20:44:55 Re: Why does logical replication launcher set application_name?