From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hot Backup with rsync fails at pg_clog if under load |
Date: | 2011-10-26 11:16:51 |
Message-ID: | 6C3D7EDA-573E-46DC-9047-5FEB92876DA8@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Oct25, 2011, at 14:51 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>
>> What I don't understand is how this affects the CLOG. How does oldestActiveXID
>> factor into CLOG initialization?
>
> It is an entirely different error.
Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said "We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly" at the start of the mail.
> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
>
> The xid shown is 22811359, which is the nextxid in the control file.
>
> So StartupClog() must have failed trying to read the clog page from disk.
Yep.
> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.
It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.
> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?
As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Pflug | 2011-10-26 11:26:54 | Re: Hot Backup with rsync fails at pg_clog if under load |
Previous Message | Simon Riggs | 2011-10-26 09:38:14 | Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified |