Re: pg_basebackup -x stream from the standby gets stuck

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup -x stream from the standby gets stuck
Date: 2012-02-22 16:02:39
Message-ID: CABUevExLkb=bXzyZcBNnmNZ982rn0G-E-2OoHj=a240EtL4VsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/
>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
>> xlog start point: 2/AC4E2600
>> pg_basebackup: starting background WAL receiver
>> 692447/692447 kB (100%), 1/1 tablespace
>> xlog end point: 2/AC4E2600
>> pg_basebackup: waiting for background process to finish streaming...
>> pg_basebackup: base backup completed
>>
>> real    3m56.237s
>> user    0m0.224s
>> sys     0m0.936s
>>
>> (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)
>
> The above article points out the problem of pg_basebackup from the standby:
> when "-x stream" is specified, pg_basebackup from the standby gets stuck if
> there is no traffic in the database.
>
> When "-x stream" is specified, pg_basebackup forks the background process
> for receiving WAL records during backup, takes an online backup and waits for
> the background process to end. The forked background process keeps receiving
> WAL records, and whenever it reaches end of WAL file, it checks whether it has
> already received all WAL files required for the backup, and exits if yes. Which
> means that at least one WAL segment switch is required for pg_basebackup with
> "-x stream" option to end.
>
> In the backup from the master, WAL file switch always occurs at both start and
> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
> above logic works fine even if there is no traffic. OTOH, in the backup from the
> standby, while there is no traffic, WAL file switch is not performed at all. So
> in that case, there is no chance that the background process reaches end of WAL
> file, check whether all required WAL arrives and exit. At the end, pg_basebackup
> gets stuck.
>
> To fix the problem, I'd propose to change the background process so that it
> checks whether all required WAL has arrived, every time data is received, even
> if end of WAL file is not reached. Patch attached. Comments?

This seems like a good thing in general.

Why does it need to modify pg_receivexlog, though? I thought only
pg_basebackup had tihs issue?

I guess it is because of the change of the API to
stream_continue_callback only? Looking at it after your patch,
stream_continue_callback and segment_finish_callback are the same.
Should we perhaps just fold them into a single
stream_continue_callback? Since you had to move the "detect segment
end" to the caller anyway?

Another question related to this - since we clearly don't need the
xlog switch in this case, should we make it conditional on the master
as well, so we don't switch unnecessarily there as well?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jay Levitt 2012-02-22 16:10:38 Re: pg_test_timing tool for EXPLAIN ANALYZE overhead
Previous Message Andrew Dunstan 2012-02-22 15:56:50 Re: leakproof