Re: Online base backup from the hot-standby

From: Steve Singer <ssinger_pg(at)sympatico(dot)ca>
To: Jun Ishiduka <ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp>
Cc: masao(dot)fujii(at)gmail(dot)com, magnus(at)hagander(dot)net, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Online base backup from the hot-standby
Date: 2011-07-10 16:20:39
Message-ID: BLU0-SMTP55C661AD49484CEECF665D8E420@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11-07-07 09:22 PM, Jun Ishiduka wrote:
>> As you proposed, adding new field which stores the backup end location
>> taken from minRecoveryPoint, into pg_control sounds good idea.
> Update patch.
Here is a review of the updated patch

This version of the patch adds a field into pg_controldata that tries to
store the source of the base backup while in recovery mode.
I think your ultimate goal with this patch is to be able to take a
backup of a running hot-standby slave and recover it as another
instance. This patch seems to provide the ability to have the second
slave stop recovery at minRecoveryPoint from the control file.

My understanding of the procedure you want to get to to take base
backups off a slave is

1. execute pg_start_backup('x') on the slave (*)
2. take a backup of the data dir
3. call pg_stop_backup() on the slave
4. Copy the control file on the slave

This patch only addresses the recovery portions.

* - I think your goal is to be able to run pg_start_backup() on the
slave, the patch so far doesn't do this. If your goal was require this
to be run on the master, then correct me.

Code Review
A few comments on the code

> *** postgresql/src/include/catalog/pg_control.h 2011-06-30
> 10:04:48.000000000 +0900
> --- postgresql_with_patch/src/include/catalog/pg_control.h 2011-07-07
> 18:23:56.000000000 +0900
> ***************
> *** 142,147 ****
> --- 142,157 ----
> XLogRecPtr backupStartPoint;
> /*
> + * Postgres keeps where to take a backup server.
> + *
> + * backupserver is "none" , "master" or "slave", its default is "none".
> + * When postgres starts and it is "none", it is updated to either
> "master"
> + * or "slave" with minRecoveryPoint of the backup server.
> + * When postgres reaches backup end location, it is updated to "none".
> + */
> + int backupserver;
> +
> + /*
> * Parameter settings that determine if the WAL can be used for archival
> * or hot standby.
> */

I don't think the above comment is very clear on what backupserver is.

* backupserver is used while postgresql is in recovery mode to
* store the location of where the backup comes from.
* When Postgres starts recovery operations
* it is set to "none". During recovery it is updated to either "master",
or "slave"
* When recovery operations finish it is updated back to "none"

Also shouldn't backupServer be the enum type of 'BackupServer' not int?
Other enums in the structure such as DBState are defined this way.

Testing Review

Since I can't yet call pg_start_backup or pg_stop_backup() on the slave
I am calling them on the master.
(I also did some testing where I didn't put the system into backup
mode). I admit that I am not sure what to look for as an indication that
the system isn't recovering to the correct point. In much of my testing
I was just verifying that the slave started and my data 'looked' okay.

I seem to get this warning in my logs when I start up the instance based
on the slave backup.
LOG: 00000: database system was interrupted while in recovery at log
time 2011-07-08 18:40:20 EDT
HINT: If this has occurred more than once some data might be corrupted
and you might need to choose an earlier recovery target

I'm wondering if this warning is a bit misleading to users because it is
an expected message when starting up an instance based on a slave backup
(because the slave was already in recovery mode). If I shutdown this
instance and start it up again I keep getting the warning. My
understanding of your patch is that there shouldn't be any risk of
corruption in that case (assuming your patch has no bugs). Can/should we
be suppressing this message when we detect that we are recovering from a
slave backup?

The direction of the patch has changed a bit during this commit fest. I
think it would be good to provide an update on the rest of the changes
you plan for this to be a complete useable feature. That would make it
easier to comment on something you
missed versus something your planning on dealing with in the next stage.


> Regards.
> --------------------------------------------
> Jun Ishizuka
> NTT Software Corporation
> TEL:045-317-7018
> E-Mail: ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp
> --------------------------------------------

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-07-10 16:51:15 Re: Enhanced psql in core?
Previous Message Florian Pflug 2011-07-10 11:06:29 Re: Need help understanding pg_locks