Re: Make mesage at end-of-recovery less scary.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: andres(at)anarazel(dot)de
Cc: peter(dot)eisentraut(at)2ndquadrant(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Make mesage at end-of-recovery less scary.
Date: 2020-03-24 01:52:50
Message-ID: 20200324.105250.1584846618506385212.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 23 Mar 2020 12:47:36 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in
> Hi,
>
> On 2020-03-23 10:37:16 +0100, Peter Eisentraut wrote:
> > On 2020-03-05 08:06, Kyotaro Horiguchi wrote:
> > > | [20866] LOG: replication terminated by primary server
> > > | [20866] DETAIL: End of WAL reached on timeline 1 at 0/30001C8.
> > > | [20866] FATAL: could not send end-of-streaming message to primary: no COPY in progress
>
> IMO it's a bug that we see this FATAL. I seem to recall that we didn't
> use to get that?

I thought that it is a convention that A auxiliary process uses ERROR
(which is turned into FATAL in ereport) to exit, which I didn't like
so much, but it was out of scope of this patch.

As for the message bove, the FATAL is preceded by the "LOG:
replication terminated by" message, that means walreceiver tries to
send new data after disconnection just to fail, which is
unreasonable. I think we should exit immediately after detecting
disconnection. The FATAL is gone by the attached.

> > > | [24267] LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
> > > | [24267] DETAIL: invalid record length at 0/3000240: wanted 24, got 0
> >
> > Is this the before and after? That doesn't seem like a substantial
> > improvement to me. You still get the "scary" message at the end.
>
> It seems like a minor improvement - folding the DETAIL into the message
> makes sense to me here. But it indeed doesn't really address the main
> issue.
>
> I think we don't want to elide the information about how the end of the
> WAL was detected - there are some issues where I found that quite
> helpful. But we could reformulate it to be clearer that it's informative
> output, not a bug. E.g. something roughly like
>
> LOG: reached end of WAL at 0/3000240 on timeline 1 in archive during standby mode
> DETAIL: End detected due to invalid record length at 0/3000240: expected 24, got 0
> (I first elided the position in the DETAIL, but it could differ from the
> one in LOG)
>
> I don't find that very satisfying, but I can't come up with something
> that provides the current information, while being less scary than my
> suggestion?

The 0-length record is not an "invalid" state during recovery, so we
can add the message for the state as "record length is 0 at %X/%X". I
think if other states found there, it implies something wrong.

LSN is redundantly shown but I'm not sure if it is better to remove it
from either of the two lines.

| LOG: reached end of WAL at 0/3000850 on timeline 1 in pg_wal during crash recovery
| DETAIL: record length is 0 at 0/3000850

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v3-0001-Make-End-Of-Recovery-error-less-scary.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-24 02:13:44 Re: Wait event that should be reported while waiting for WAL archiving to finish
Previous Message Fujii Masao 2020-03-24 01:46:51 Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side