Re: pg_basebackup -x stream from the standby gets stuck

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup -x stream from the standby gets stuck
Date: 2012-03-02 13:26:32
Message-ID: CABUevEyqSUb4E1RrzJGe7e_M6yoaNg6kN1YBVeV76DX75DP81w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/
>>>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
>>>> xlog start point: 2/AC4E2600
>>>> pg_basebackup: starting background WAL receiver
>>>> 692447/692447 kB (100%), 1/1 tablespace
>>>> xlog end point: 2/AC4E2600
>>>> pg_basebackup: waiting for background process to finish streaming...
>>>> pg_basebackup: base backup completed
>>>>
>>>> real    3m56.237s
>>>> user    0m0.224s
>>>> sys     0m0.936s
>>>>
>>>> (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)
>>>
>>> The above article points out the problem of pg_basebackup from the standby:
>>> when "-x stream" is specified, pg_basebackup from the standby gets stuck if
>>> there is no traffic in the database.
>>>
>>> When "-x stream" is specified, pg_basebackup forks the background process
>>> for receiving WAL records during backup, takes an online backup and waits for
>>> the background process to end. The forked background process keeps receiving
>>> WAL records, and whenever it reaches end of WAL file, it checks whether it has
>>> already received all WAL files required for the backup, and exits if yes. Which
>>> means that at least one WAL segment switch is required for pg_basebackup with
>>> "-x stream" option to end.
>>>
>>> In the backup from the master, WAL file switch always occurs at both start and
>>> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
>>> above logic works fine even if there is no traffic. OTOH, in the backup from the
>>> standby, while there is no traffic, WAL file switch is not performed at all. So
>>> in that case, there is no chance that the background process reaches end of WAL
>>> file, check whether all required WAL arrives and exit. At the end, pg_basebackup
>>> gets stuck.
>>>
>>> To fix the problem, I'd propose to change the background process so that it
>>> checks whether all required WAL has arrived, every time data is received, even
>>> if end of WAL file is not reached. Patch attached. Comments?
>>
>> This seems like a good thing in general.
>>
>> Why does it need to modify pg_receivexlog, though? I thought only
>> pg_basebackup had tihs issue?
>>
>> I guess it is because of the change of the API to
>> stream_continue_callback only?
>
> Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c.
>
> But the reason why I changed segment_callback() in pg_receivexlog.c is not the
> same. I did that because previously segment_finish_callback is called
> only at the
> end of WAL segment but in the patch it can be called at the middle of segment.
> OTOH, segment_callback() must emit a verbose message only when current
> WAL segment is complete. So I had to add the check of whether current WAL
> segment is partial or complete into segment_callback().

Yeah, I caught that.

>> Looking at it after your patch,
>> stream_continue_callback and segment_finish_callback are the same.
>> Should we perhaps just fold them into a single
>> stream_continue_callback? Since you had to move the "detect segment
>> end" to the caller anyway?
>
> No. I think we cannot do that because in pg_receivexlog they are not the same.

But couldn't they be made the same by making the same check as you put
in for the verbose message above?

>> Another question related to this - since we clearly don't need the
>> xlog switch in this case, should we make it conditional on the master
>> as well, so we don't switch unnecessarily there as well?
>
> Maybe. At the end of backup, we force WAL segment switch, to ensure all required
> WAL files have been archived. So theoretically if WAL archiving is not enabled,
> we can skip WAL segment switch. But some backup tools might depend on this
> behavior....

I was thinking we could keep doing it in pg_stop_backup(), but avoid
doing it when using pg_basebackup only...

> In standby-only backup, we always skip WAL segment switch. So there is
> no guarantee
> that all WAL files required for the backup are archived at the end of
> backup. This
> limitation is documented.

Right.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-03-02 14:25:27 Re: autovacuum locks
Previous Message Heikki Linnakangas 2012-03-02 12:42:31 Re: Reducing bgwriter wakeups