Re: Hot Backup with rsync fails at pg_clog if under load

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-26 11:16:51
Message-ID: 6C3D7EDA-573E-46DC-9047-5FEB92876DA8@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct25, 2011, at 14:51 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>
>> What I don't understand is how this affects the CLOG. How does oldestActiveXID
>> factor into CLOG initialization?
>
> It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said "We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly" at the start of the mail.

> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
>
> The xid shown is 22811359, which is the nextxid in the control file.
>
> So StartupClog() must have failed trying to read the clog page from disk.

Yep.

> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-10-26 11:26:54 Re: Hot Backup with rsync fails at pg_clog if under load
Previous Message Simon Riggs 2011-10-26 09:38:14 Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified