Skip site navigation (1) Skip section navigation (2)

Re: Online base backup from the hot-standby

From: Steve Singer <ssinger_pg(at)sympatico(dot)ca>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Jun Ishiduka <ishizuka(dot)jun(at)po(dot)ntts(dot)co(dot)jp>, magnus(at)hagander(dot)net, heikki(dot)linnakangas(at)enterprisedb(dot)com, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com
Subject: Re: Online base backup from the hot-standby
Date: 2012-01-20 04:01:49
Message-ID: BLU0-SMTP7467A0BBFE563ED7A75CDE8E870@phx.gbl (view raw or flat)
Thread:
Lists: pgsql-hackers
On 12-01-17 05:38 AM, Fujii Masao wrote:
> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com>  wrote:
>> The amount of code changes to allow pg_basebackup to make a backup from
>> the standby seems to be small. So I ended up merging that changes and the
>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
>> if you want.
> Attached is the updated version of the patch. I wrote the limitations of
> standby-only backup in the document and changed the error messages.
>

Here is my review of this verison of the patch. I think this patch has 
been in every CF for 9.2 and I feel it is getting close to being 
committed.  The only issue of significants is a crash I encountered 
while testing, see below.

I am fine with including the pg_basebackup changes in the patch it also 
makes testing some of the other changes possible.


The documentation updates you have are good

I don't see any issues looking at the code.



Testing Review
--------------------------------

I encountered this on my first replica (the one based on the master).  I 
am not sure if it is related to this patch, it happened after the 
pg_basebackup against the replica finished.

TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: 
"twophase.c", Line: 1238)
LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

A little earlier this postmaster had printed.

LOG:  restored log file "00000001000000000000001F" from archive
LOG:  restored log file "000000010000000000000020" from archive
cp: cannot stat 
`/usr/local/pgsql92git/archive/000000010000000000000021': No such file 
or directory
LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0
cp: cannot stat 
`/usr/local/pgsql92git/archive/000000010000000000000021': No such file 
or directory


I have NOT been able to replicate this error  and I am not sure exactly 
what I had done in my testing prior to that point.


In another test run I had

- set full page writes=off and did a checkpoint
- Started the pg_basebackup
- set full_page_writes=on and did a HUP + some database activity that 
might have forced a checkpoint.

I got this message from pg_basebackup.
./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
pg_basebackup: could not get WAL end position from server

I point this out because the message is different than the normal "could 
not initiate base backup: FATAL:  WAL generated with 
full_page_writes=off" thatI normally see.    We might want to add a 
PQerrorMessage(conn)) to pg_basebackup to print the error details.  
Since this patch didn't actually change pg_basebackup I don't think your 
required to improve the error messages in it.  I am just mentioning this 
because it came up in testing.


The rest of the tests I did involving changing full_page_writes  
with/without checkpoints and sighups and promoting the replica seemed to 
work as expected.




> Regards,
>
>
>
>

In response to

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2012-01-20 04:16:10
Subject: Re: triggered_change_notification v3
Previous:From: Peter GeogheganDate: 2012-01-20 03:46:54
Subject: Re: Group commit, revised

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group