Re: Hot Backup with rsync fails at pg_clog if under load

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-26 14:18:06
Message-ID: CAC_2qU_kyy3yFKJ+twu0oMf+LKGh1f0nDn4ySTs6DyG3GK9vBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Oct26, 2011, at 15:12 , Simon Riggs wrote:
>> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
>>
>>> The read fails because their is no data at the location it's trying to
>>> read from, because clog hasn't been extended yet by recovery.
>>
>> You don't actually know that, though I agree it seems a reasonable
>> guess and was my first thought also.
>
> The actual error message also supports that theory. Here's the relevant
> snippet from the OP's log (Found in CA9FD2FE(dot)1D8D2%linas(dot)virbalas(at)continuent(dot)com)
>
> 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
> 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file "pg_clog/0001" at offset 32768: Success.
>
> Note that it says "Success" at the end of the second log entry. That
> can only happen, I think, if we're trying to read the page adjacent to
> the last page in the file. The seek would be successfull, and the subsequent
> read() would indicate EOF by returning zero bytes. None of the calls would
> set errno. If there was a real IO error, read() would set errno, and if the
> page wasn't adjacent to the last page in the file, seek() would set errno.
> In both cases we'd see the corresponding error messag, not "Success".

And even more pointedly, in the original go around on this:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/0000 after pg_start_backup call:
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

Changed during the rsync phase to this:
-rw------- 1 postgres postgres 16384 Sep 23 14:33 0000

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):
-rw------- 1 postgres postgres 8192 Sep 23 14:31 0000

With the error:
2011-09-23 14:33:46 CEST FATAL: could not access status of transaction 37206
2011-09-23 14:33:46 CEST DETAIL: Could not read from file
"pg_clog/0000" at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the "recent" clog/0000 from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan(at)highrise(dot)ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-10-26 14:37:00 Re: pgsql_fdw, FDW for PostgreSQL server
Previous Message Florian Pflug 2011-10-26 13:57:12 Re: Hot Backup with rsync fails at pg_clog if under load