RE: BUG #15078: Unable to receive data from WAL Stream Error

From: "Kolla, Mahesh" <Mahesh(dot)Kolla(at)transunion(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Eric Radman <ericshane(at)eradman(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #15078: Unable to receive data from WAL Stream Error
Date: 2018-02-26 21:49:33
Message-ID: 5745a050aefd493f838a8c17516454a3@transunion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Tomas,

Thank you for the suggestions.

We are not getting any other errors in our stand by database .This error is written directly to postgres.log by Logger process. We are not running any shell script for it .

It is showing dev/null :no such file or directory may be because of this command
restore_command='cp /archive/%f %p 2>/dev/null' in recovery.conf file

Please let me know whether it gives any clue to answer this problem

Thank you
Mahesh Kolla

-----Original Message-----
From: Tomas Vondra [mailto:tomas(dot)vondra(at)2ndquadrant(dot)com]
Sent: Sunday, February 25, 2018 6:45 PM
To: Kolla, Mahesh <Mahesh(dot)Kolla(at)transunion(dot)com>; Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Eric Radman <ericshane(at)eradman(dot)com>; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15078: Unable to receive data from WAL Stream Error

On 02/22/2018 07:43 PM, Kolla, Mahesh wrote:
> Hello Tom ,
>
> Please let me know why we are getting below associated LOGS saying
invalid resource manager with the FATAL message Unable to receive data from WAL Stream Error
>
>> sh: dev/null: No such file or directory
>> sh: dev/null: No such file or directory < 2018-02-19 01:37:48.860 CST
>>> LOG: invalid resource manager ID 48 at
>> 15/2D69E848
>

I believe that essentially means the WAL is corrupted in some way, possibly due to a network issue. I don't think I've seen such error message though, so not sure.

FWIW it's really hard to investigate issues when you only copy three lines, two of which are errors in your shell script. That provides no context whatsoever.

> Please kindly suggest us a value for tcp_keepalives_idle as it is
> presently set to 0
>

That really depends on your networking configuration, but you can try this:

tcp_keepalives_idle = 60
tcp_keepalives_interval = 15
tcp_keepalives_count = 3

which essentially pings the server every 60 seconds, if the server does not respond in 15 seconds it'll try again, and will consider the connection gone after 3 failures.

But it's unclear if this really is a networking issue, so hard to say if this improves the situation.

regards

--
Tomas Vondra https://urldefense.proofpoint.com/v2/url?u=http-3A__www.2ndQuadrant.com&d=DwICaQ&c=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw&r=TA1Pdlc8dsrZZaPtMe1RtA9m8ljv1LsiVrnfONx6s5s&m=v2w_MPC4qmLjfQj0gSGkwxnfk_83kjpOcUuJqFeKjGY&s=52uy0dtmIjRjlwzoB12AEOPe0DQdw0sYU7CorW06OOc&e=
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-02-27 01:24:41 Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Previous Message Devrim Gündüz 2018-02-26 21:25:56 Re: BUG #15090: libpq.so: /lib64/libc.so.6: version `GLIBC_2.14' not found