Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date: 2012-06-11 15:47:00
Message-ID: CABUevEx+oitecsYAX7+7R5cAD5hriUF4R21Skiv9NSiDr+m2oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>
>>>>>>>>>>> Absolutely.  If we have replication clients that are not actually
>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>> to know that.
>>>>>>>>>>
>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>
>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> If so then yes, that is
>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>
>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>> STREAMING state.
>>>>>>>>>
>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>
>>>>>>>>>    START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>
>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>
>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>
>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>> something like NOSYNC option.
>>>>>>
>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>
>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>> background stream process forked from pg_basebackup. I think that
>>>>> it should not run as sync standby but sending back its replication progress
>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>
>>>> On principle, no. By default, yes.
>>>>
>>>> How about:
>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>> won't become sync replica
>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>> become sync replica, but can be made so with a switch
>>>
>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>> background don't flush WAL files at all?
>>
>> That sounds like a "documentable issue".
>>
>> But maybe you're right, and we need the "never become sync" as a flag.
>
> You agreed to add something like NOSYNC option into START_REPLICATION command?

I'm on the fence. I was hoping somebody else would chime in with an
opinion as well.

I just realized this thread is on -admin. Moving it to -hackers so
more of the right people will spot it.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Fujii Masao 2012-06-11 16:06:58 Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Previous Message Magnus Hagander 2012-06-11 13:19:12 Re: pg_basebackup blocking all queries with horrible performance

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-11 16:06:25 Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Previous Message Robert Haas 2012-06-11 15:44:37 Re: Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented