Skip site navigation (1) Skip section navigation (2)

Re: pg_basebackup and wal streaming

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and wal streaming
Date: 2011-02-27 14:39:05
Message-ID: 4D6A6209.9030908@gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 2011-02-26 20:59, Magnus Hagander wrote:
> On Sat, Feb 26, 2011 at 20:48, Yeb Havinga<yebhavinga(at)gmail(dot)com>  wrote:
>> On 2011-02-26 18:19, Magnus Hagander wrote:
>>> Attached is an updated version of the patch that includes these
>>> changes, as well as Windows support and an initial cut at a ref page
>>> for pg_receivexlog (needs some more detail still).
>> I'm testing a bit more (with the previous version, sorry) and got the
>> following while doing a stream backup from a cluster that was at that moment
>> doing a pgbench run with 1 synchronous standby.
>>
>> mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
>> Password:
>> xlog start point: 15/720000C8
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: got WAL data offset 14744, expected 16791960424        )
>> 5148915/5148026 kb g(100%) 1/1 tablespaces
>> xlog end point: 15/80568878
>> pg_basebackup: waiting for background process to finish streaming...
>> pg_basebackup: child process exited with error 1
> Hmm, strange. What platform are you on?
Both master and backup are on
Linux mg79 2.6.31-22-server #60-Ubuntu SMP Thu May 27 03:42:09 UTC 2010 
x86_64 GNU/Linux

If I run a streaming backup with idle master server, things are ok.

I can repeat the error by doing a pgbench run on the master (with a 
syncrep standby, don't know if that's of importance, other that there is 
another walsender) and then running a streaming pg_basebackup.

> I saw something similar *once* on Windows, but it then passed my tests
> a lot of times in a row so I figured it was just a "didn't clean
> properly" thing. Clearly there's a bug around.
>
> What's the size of the latest WAL file that it did work on? Is it
> 16791960424 bytes? That's way way to large, but perhaps it's not
> switching segment properly? (that value is supposedly the current
> write position in the file..)
The files from that specific run are gone.

mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 47/8E81F300
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 41432, expected 168186486424        )
4763109/4762428 kb g(100%) 1/1 tablespaces
xlog end point: 47/9D17FFE0
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

About the file sizes, every WAL file on the master is 16M big, though 
the sum of the size of all WAL files on the master is close to the 16G 
number. Maybe a coincidence..
/dev/sdc1             20970612  16798508   4172104  81% /xlog

New initdb, pgbench with smaller db, fresh /xlog:

mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 0/8C6474E8
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 10488, expected 167877046397        )
1564067/1563386 kb g(100%) 1/1 tablespaces
xlog end point: 0/99007798
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

yes a coincidence..

/dev/sdc1              20G  2.6G   18G  13% /xlog

Another test with no sync standby server, only the master with a pgbench 
running:

mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 0/D5002120
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 23752, expected 168009686397        )
1572173/1570348 kb g(100%) 1/1 tablespaces
xlog end point: 0/EC456968
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

Stopping pgbench and starting a basebackup., then it finishes correctly 
after a while (with in between something that looks like ~ 2 minutes 
inactivity).

regards,
Yeb Havinga


In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2011-02-27 15:45:55
Subject: Re: Native XML
Previous:From: Heikki LinnakangasDate: 2011-02-27 14:04:31
Subject: Re: Snapshot synchronization, again...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group