Re: pg_basebackup -x stream from the standby gets stuck

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup -x stream from the standby gets stuck
Date: 2012-05-23 12:25:49
Message-ID: CABUevEyRy-6V6EBocFq0Mzb=73DmkLrNtxQyODP5tw3BC0H=bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 2, 2012 at 2:26 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> Hi,
>>>>
>>>> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/
>>>>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
>>>>> xlog start point: 2/AC4E2600
>>>>> pg_basebackup: starting background WAL receiver
>>>>> 692447/692447 kB (100%), 1/1 tablespace
>>>>> xlog end point: 2/AC4E2600
>>>>> pg_basebackup: waiting for background process to finish streaming...
>>>>> pg_basebackup: base backup completed
>>>>>
>>>>> real    3m56.237s
>>>>> user    0m0.224s
>>>>> sys     0m0.936s
>>>>>
>>>>> (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)
>>>>
>>>> The above article points out the problem of pg_basebackup from the standby:
>>>> when "-x stream" is specified, pg_basebackup from the standby gets stuck if
>>>> there is no traffic in the database.
>>>>
>>>> When "-x stream" is specified, pg_basebackup forks the background process
>>>> for receiving WAL records during backup, takes an online backup and waits for
>>>> the background process to end. The forked background process keeps receiving
>>>> WAL records, and whenever it reaches end of WAL file, it checks whether it has
>>>> already received all WAL files required for the backup, and exits if yes. Which
>>>> means that at least one WAL segment switch is required for pg_basebackup with
>>>> "-x stream" option to end.
>>>>
>>>> In the backup from the master, WAL file switch always occurs at both start and
>>>> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
>>>> above logic works fine even if there is no traffic. OTOH, in the backup from the
>>>> standby, while there is no traffic, WAL file switch is not performed at all. So
>>>> in that case, there is no chance that the background process reaches end of WAL
>>>> file, check whether all required WAL arrives and exit. At the end, pg_basebackup
>>>> gets stuck.
>>>>
>>>> To fix the problem, I'd propose to change the background process so that it
>>>> checks whether all required WAL has arrived, every time data is received, even
>>>> if end of WAL file is not reached. Patch attached. Comments?
>>>
>>> This seems like a good thing in general.
>>>
>>> Why does it need to modify pg_receivexlog, though? I thought only
>>> pg_basebackup had tihs issue?
>>>
>>> I guess it is because of the change of the API to
>>> stream_continue_callback only?
>>
>> Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c.
>>
>> But the reason why I changed segment_callback() in pg_receivexlog.c is not the
>> same. I did that because previously segment_finish_callback is called
>> only at the
>> end of WAL segment but in the patch it can be called at the middle of segment.
>> OTOH, segment_callback() must emit a verbose message only when current
>> WAL segment is complete. So I had to add the check of whether current WAL
>> segment is partial or complete into segment_callback().
>
> Yeah, I caught that.
>
>
>>> Looking at it after your patch,
>>> stream_continue_callback and segment_finish_callback are the same.
>>> Should we perhaps just fold them into a single
>>> stream_continue_callback? Since you had to move the "detect segment
>>> end" to the caller anyway?
>>
>> No. I think we cannot do that because in pg_receivexlog they are not the same.
>
> But couldn't they be made the same by making the same check as you put
> in for the verbose message above?
>

While reviewing and cleaning this patch up a bit I noticed it actually
broke pg_receivexlog in the renaming.

Here is a new version of the patch, reworked based on the above so
we're down to a single callback. I moved the "rename last segment file
even if it's not complete" to be a parameter into ReceiveXlogStream()
instead of trying to overload a third functionality on the callback
(which is what broke pg_receivexlog).

How does this look? Have I overlooked any cases?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
xlog_stream2.patch application/octet-stream 7.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2012-05-23 15:09:04 Re: Why is indexonlyscan so darned slow?
Previous Message Kohei KaiGai 2012-05-23 12:00:09 [RFC] Interface of Row Level Security