From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: recovery getting interrupted is not so unusual as it used to be |
Date: | 2010-06-03 02:34:44 |
Message-ID: | 797003F7-F352-4F71-BC98-A9F1F1978457@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jun 3, 2010, at 3:31 , Robert Haas wrote:
> On Wed, Jun 2, 2010 at 9:07 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Jun 3, 2010, at 0:58 , Robert Haas wrote:
>>> But maybe the message isn't right the first time either. After all
>>> the point of having a write-ahead log in the first place is that we
>>> should be able to prevent corruption in the event of an unexpected
>>> shutdown. Maybe the right thing to do is to forget about adding a new
>>> state and just remove or change the errhint from these messages:
>>
>> You've fallen prey to a (very common) miss-interpration of this message. It is not about corruption *caused* by a crash during recovery, it's about corruption *causing* the crash.
>>
>> I'm not in favor of getting rid of that message entirely, since produces a worthwhile hint if the crash was really caused by corrupt data. But it desperately needs a better wording that makes cause and effect perfectly clear. That even you miss-read it conclusively proves that.
>>
>> How about
>> "If this has happened repeatedly and without manual intervention, it was probably caused by corrupted data and you may need to restore from backup"
>> for the crash recovery case and
>> "If this has happened repeatedly and without manual intervention, it was probably caused by corrupted data and you may need to choose an earlier recovery target"
>> for the PITR case.
>
> Oh. Well, if that's the case, then I guess I lean toward applying the
> patch as-is. Then there's no need for the caveat "and without manual
> intervention".
That still leaves the messages awfully ambiguous concerning the cause (data corruption) and the effect (crash during recovery).
How about
"If this has occurred more than once, it is probably caused by corrupt data and you have to use the latest backup for recovery"
for the crash recovery case and
"If this has occurred more than once, it is probably caused by corrupt data and you have to choose an earlier recovery target"
for the PITR case.
I don't see why currently only the PITR-case includes the "more than once" clause. Its probably supposed to prevent unnecessarily alarming the user if the "crash" was in fact a stray SIGKILL or an out-of-memory condition, which seems equally likely in both cases.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | KaiGai Kohei | 2010-06-03 02:36:52 | Re: [RFC] A tackle to the leaky VIEWs for RLS |
Previous Message | Bruce Momjian | 2010-06-03 01:47:43 | Re: Comments on Exclusion Constraints and related datatypes |