Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Valentine Gogichashvili <valgog(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
Date: 2010-08-13 01:21:07
Message-ID: AANLkTi=MUYA8NuWarp2Gcb04-yvKWP8-29RwU1y7=iPi@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Aug 12, 2010 at 11:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (based on Simon's suggestion)
>> 1. run pg_start_backup() on master.
>> 2. copy backup_label from master to temporary area.
>>    copying backup_label directly to standby would generate another
>>    weakness (e.g., what if standby is restarted while backup_label
>>    exists in standby?), so backup_label should be copied to elsewhere
>>    than standby.
>> 3. wait for "Latest checkpoint's REDO location" which pg_controldata
>>    on standby returns, to reach or exceed "START WAL LOCATION" in
>>    backup_label copied in the step 2. This would take long, but we
>>    can run checkpoint on standby to shorten waiting time.
>
> Hm, can you actually execute CHECKPOINT on a HS slave?

Yes.

>  Is it guaranteed
> to cause a restartpoint to be created?

CHECKPOINT on a HS slave creates a restartpoint only when there
is CHECKPOINT record which has already been replayed but has not
created a restartpoint yet. Such a CHECKPOINT record is expected
to exist after the step 2 because it's generated by pg_start_backup
in the step1. So executing CHECKPOINT on a HS slave at the step 3
would almost create a restartpoint.

But, in file-based log shipping case, it might take long to ship
such a CHECKPOINT record. So we might need to execute
pg_switch_xlog() on the master before executing CHECKPOINT on the
slave.

>> 4. run backup on standby
>> 5. run pg_stop_backup() on master
>> 6. copy backup_label from temporary are to backup
>
>> Is this procedure still unsafe?
>
> This still isn't doing anything to address the problem I'm worried
> about, which is when does the copy actually reach consistency.  The
> above procedure might guarantee that it eventually will reach
> consistency, but you don't know when it has.

Once new standby starting from the backup taken from another
standby has reached the backup end location (i.e., it has read
the XLOG_BACKUP_END record generated by pg_stop_backup in the
step 5), we can think that the database has reach consistency.
Since new standby doesn't accept connections from the client
until that, we can ensure that the users will not access to
inconsistent database.

Regards,

PS. I'll be unable to read hackers from Aug 13th to 20th because of
a vacation.

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message hubert depesz lubaczewski 2010-08-13 08:38:06 Re: BUG #5616: psql Doesn't Change Log files on \c
Previous Message Leo Shklovskii 2010-08-13 00:01:17 BUG #5617: pg_restore behaves unexpectedly on 'invalid' command line