Re: Hot Backup with rsync fails at pg_clog if under load

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Daniel Farina <daniel(at)heroku(dot)com>, Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Backup with rsync fails at pg_clog if under load
Date: 2011-10-26 16:08:45
Message-ID: CA+U5nMLyW=S1DTBhswOirf_EZX06epvtyUZN364BGPJBDH_tiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> As you said, the CLOG page corresponding to nextId
>> *should* always be accessible at the start of recovery (Unless whole file
>> has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
>> Yet the error suggest that the CLOG is, in fact, too short. What I said
>> is that we shouldn't apply any fix (for the CLOG problem) before we understand
>> the reason for that apparent contradiction.
>
> Ha! I think I've got a working theory.
>
> In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
> record, and then call CheckPointGuts() which does the actual writing and fsyncing.
> So far, that fine. If a transaction ID is assigned before we compute the
> checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() will
> make sure the new CLOG page goes to disk.
>
> But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
> CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
> fine too, except that LogStandbySnapshot() re-assigned the *current* value of
> ShmemVariableCache->nextXid to the checkpoint's nextXid field.
>
> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
> whose CLOG page hasn't necessarily made it to the disk yet. The longer CheckPointGuts()
> takes to finish it's work the more likely it becomes (assuming that CLOG writing
> and syncing doesn't happen at the very end). This fits the OP's observation ob the
> problem vanishing when pg_start_backup() does an immediate checkpoint.

This is the correct explanation. I've just come back into Wifi range,
so I was just writing to you with this explanation but your original
point that nextxid must be wrong deserves credit. OTOH I was just
waiting to find out what the reason for the physical read was, rather
than guessing.

Notice that the nextxid value isn't wrong, its just not the correct
value to use for starting clog.

As it turns out the correct fix is actually just to skip StartupClog()
until the end of recovery because it does nothing useful when executed
at that time. When I wrote the original code I remember thinking that
StartupClog() is superfluous at that point.

Brewing a patch now.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-10-26 16:16:07 Re: Hot Backup with rsync fails at pg_clog if under load
Previous Message Florian Pflug 2011-10-26 15:59:59 Re: Hot Backup with rsync fails at pg_clog if under load