Re: pg_basebackup and wal streaming

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and wal streaming
Date: 2011-02-27 14:39:05
Message-ID: 4D6A6209.9030908@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-02-26 20:59, Magnus Hagander wrote:
> On Sat, Feb 26, 2011 at 20:48, Yeb Havinga<yebhavinga(at)gmail(dot)com> wrote:
>> On 2011-02-26 18:19, Magnus Hagander wrote:
>>> Attached is an updated version of the patch that includes these
>>> changes, as well as Windows support and an initial cut at a ref page
>>> for pg_receivexlog (needs some more detail still).
>> I'm testing a bit more (with the previous version, sorry) and got the
>> following while doing a stream backup from a cluster that was at that moment
>> doing a pgbench run with 1 synchronous standby.
>>
>> mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
>> Password:
>> xlog start point: 15/720000C8
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: got WAL data offset 14744, expected 16791960424 )
>> 5148915/5148026 kb g(100%) 1/1 tablespaces
>> xlog end point: 15/80568878
>> pg_basebackup: waiting for background process to finish streaming...
>> pg_basebackup: child process exited with error 1
> Hmm, strange. What platform are you on?
Both master and backup are on
Linux mg79 2.6.31-22-server #60-Ubuntu SMP Thu May 27 03:42:09 UTC 2010
x86_64 GNU/Linux

If I run a streaming backup with idle master server, things are ok.

I can repeat the error by doing a pgbench run on the master (with a
syncrep standby, don't know if that's of importance, other that there is
another walsender) and then running a streaming pg_basebackup.

> I saw something similar *once* on Windows, but it then passed my tests
> a lot of times in a row so I figured it was just a "didn't clean
> properly" thing. Clearly there's a bug around.
>
> What's the size of the latest WAL file that it did work on? Is it
> 16791960424 bytes? That's way way to large, but perhaps it's not
> switching segment properly? (that value is supposedly the current
> write position in the file..)
The files from that specific run are gone.

mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 47/8E81F300
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 41432, expected 168186486424 )
4763109/4762428 kb g(100%) 1/1 tablespaces
xlog end point: 47/9D17FFE0
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

About the file sizes, every WAL file on the master is 16M big, though
the sum of the size of all WAL files on the master is close to the 16G
number. Maybe a coincidence..
/dev/sdc1 20970612 16798508 4172104 81% /xlog

New initdb, pgbench with smaller db, fresh /xlog:

mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 0/8C6474E8
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 10488, expected 167877046397 )
1564067/1563386 kb g(100%) 1/1 tablespaces
xlog end point: 0/99007798
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

yes a coincidence..

/dev/sdc1 20G 2.6G 18G 13% /xlog

Another test with no sync standby server, only the master with a pgbench
running:

mgrid(at)mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 0/D5002120
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 23752, expected 168009686397 )
1572173/1570348 kb g(100%) 1/1 tablespaces
xlog end point: 0/EC456968
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

Stopping pgbench and starting a basebackup., then it finishes correctly
after a while (with in between something that looks like ~ 2 minutes
inactivity).

regards,
Yeb Havinga

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-02-27 15:45:55 Re: Native XML
Previous Message Heikki Linnakangas 2011-02-27 14:04:31 Re: Snapshot synchronization, again...