Re: warning message in standby

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: warning message in standby
Date: 2010-06-14 11:49:34
Message-ID: 4C16174E.6020004@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/06/10 13:16, Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> On 12/06/10 04:19, Bruce Momjian wrote:
>>> Robert Haas wrote:
>>>>> If my streaming replication stops working, I want to know about it as
>>>>> soon as possible. WARNING just doesn't cut it.
>>>>>
>>>>> This needs some better thought.
>>>>>
>>>>> If we PANIC, then surely it will PANIC again when we restart unless we
>>>>> do something. So we can't do that. But we need to do something better
>>>>> than
>>>>>
>>>>> WARNING there is a bug that will likely cause major data loss
>>>>> HINT you'll be sacked if you miss this message
>>>>
>>>> +1. I was making this same argument (less eloquently) upthread.
>>>> I particularly like the errhint().
>>>
>>> I am wondering what action would be most likely to get the
>>> administrator's attention.
>>
>> I've committed the patch to disconnect the SR connection in that case.
>> If the message needs improvement, let's do that separately once we
>> figure out what to do.
>>
>> Seems like we need something like WARNING that doesn't cause the process
>> to die, but more alarming like ERROR/FATAL/PANIC. Or maybe just adding a
>> hint to the warning will do. How about
>>
>> WARNING: invalid record length at 0/4005330
>> HINT: An invalid record was streamed from master. That can be a sign of
>> corruption in the master, or inconsistency between master and standby
>> state. The record will be re-fetched, but that is unlikely to fix the
>> problem. You may have to restore standby from base backup.
>
> I am thinking about log monitoring tools like Nagios. I am afraid
> they are never going to pick up something tagged WARNING, no matter
> what the wording is.

One idea is for the startup process to signal walreceiver process to
commit suicide with FATAL, instead of just dying silently like it does
now. So you'd get a WARNING explaining how the record was corrupt,
followed by a FATAL from the walreceiver process:

WARNING: invalid record length at 0/4005330
FATAL: walreceiver killed because of error in WAL stream

> Crazy idea, but can we force a fatal error line
> into the logs with something like "WARNING ...\nFATAL: ...".

Yeah, that's crazy.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-06-14 11:54:17 Re: ExecutorCheckPerms() hook
Previous Message Robert Haas 2010-06-14 11:46:19 Re: GSoC - Materialized Views - is stale or fresh?