From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: warning message in standby |
Date: | 2010-06-11 04:18:54 |
Message-ID: | AANLkTimyzbyx2LnOB1T15v46elaYq6IHqUe-JzD3DWvu@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> We're talking about a corrupt record (incorrect CRC, incorrect backlink
> etc.), not errors within redo functions. During crash recovery, a corrupt
> record means you've reached end of WAL. In standby mode, when streaming WAL
> from master, that shouldn't happen, and it's not clear what to do if it
> does. PANIC is not a good idea, at least if the server uses hot standby,
> because that only makes the situation worse from availability point of view.
> So we log the error as a WARNING, and keep retrying. It's unlikely that the
> problem will just go away, but we keep retrying anyway in the hope that it
> does. However, it seems that we're too aggressive with the retries.
Right. The attached patch calms down the retries: if we found an invalid
record while streaming WAL from master, we sleep for 5 seconds (needs to
be reduced?) before retrying to replay the record which is in the same
location where the invalid one was found. Comments?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
calm_down_retries_v1.patch | application/octet-stream | 823 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-06-11 04:24:56 | Re: LLVM / clang |
Previous Message | Peter Eisentraut | 2010-06-11 04:00:48 | Re: LLVM / clang |