Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date: 2012-07-04 13:26:00
Message-ID: CABUevEwh70t0t19PNpfGxPmXdcT82G61_pyZ5LtWvdf0Ptci3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Mon, Jul 2, 2012 at 8:17 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>>>
>>>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>>>> opinion as well.
>>>>>>>>
>>>>>>>> +1
>>>>>>>
>>>>>>> Nobody else with any opinion on this? :(
>>>>>>
>>>>>> I don't think we really need a NOSYNC flag at this point. Just not
>>>>>> setting the flush location in clients that make a point of flushing in
>>>>>> a timely fashion seems fine.
>>>>>
>>>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>>>> patch attached.
>>>>>
>>>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>>>> return invalid location as flush one, and will never become sync standby even
>>>>> if their name is in synchronous_standby_names. The timing of their sending
>>>>
>>>> That doesn't match with the patch, afaics. The patch always sets the
>>>> correct write location, which means it can become a remote_write
>>>> synchronous standby, no? It will only send it back when timeout
>>>> expires, but it will be sent back.
>>>
>>> No. Though correct write location is sent back, they don't become sync standby
>>> because flush location is always invalid. While flush location is
>>> invalid, the master
>>> will never regard the remote server as sync one even if synchronous_commit is
>>> set to remote_write.
>>
>> Oh. I wasn't aware of that part.
>>
>>
>>>> I wonder if that might actually be a more reasonable mode of operation
>>>> in general:
>>>>
>>>> * always send back the write position, at the write interval
>>>> * always send back the flush position, when we're flushing (meaning
>>>> when we switch xlog)
>>>>
>>>> have an option that makes it possible to:
>>>> * always send back the write position as soon as it changes (making
>>>> for a reasonable remote_write sync standby)
>>>> * actually flush the log after each write instead of end of file
>>>> (making for a reasonable full sync standby)
>>>>
>>>> meaning you'd have something like "pg_receivexlog --sync=write" and
>>>> "pg_receivexlog --sync=flush" controlling it instead.
>>>
>>> Yeah, in this way, pg_receivexlog can become sync even if
>>> synchronous_commit is on, which seems more useful. But
>>> I'm thinking that the synchronous pg_receivexlog stuff should
>>> be postponed to 9.3 because its patch seems to become too
>>> big to apply at this beta stage. So, in 9.2, to fix the problem,
>>> what about just applying the simple patch which prevents
>>> pg_basebackup background process and pg_receivexlog from
>>> becoming sync standby whatever synchronous_standby_names
>>> and synchronous_commit are set to?
>>
>> Agreed.
>>
>> With the addition that we should set the write location, because
>> that's very useful and per what you said above should be perfectly
>> safe.
>>
>>
>>>> And deal with the "user put * in synchronous_standby_names and
>>>> accidentally got pg_receivexlog as the sync standby" by more clearly
>>>> warning people not to use * for that parameter... Since it's simply
>>>> dangerous :)
>>>
>>> Yep.
>>
>> What would be good wording? Something along the line of "Using the *
>> entry is not recommended since it can lead to unexpected results when
>> new standbys are added" or something like that?
>>
>>
>>>>> the reply depends on the standby_message_timeout specified in -s option. So
>>>>> the write position may lag behind the true position.
>>>>>
>>>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>>>> standby. It sends back the reply to the master each time the write position
>>>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>>>> synchronous replication to pg_receivexlog would work well.
>>>>
>>>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>>>> why you have to track the current write position across loads, which
>>>> first confused me.
>>>
>>> The patch has to track the current write location to decide whether to send
>>> back the reply to the master, IOW to know whether the write location
>>> has changed, IOW to know whether we've already sent the reply about
>>> the latest write location.
>>
>> Yeha, makes perfect sense.
>>
>>
>>>> Looking at some other usecases for this, I wonder if we should also
>>>> force a status message whenever we switch xlog files, even if we
>>>> aren't running in sync mode, even if the timeout hasn't expired. I
>>>> think that would be a reasonable thing to do, since you often want to
>>>> track things based on files.
>>>
>>> You mean that the pg_receivexlog should send back the correct flush
>>> location whenever it switches xlog files?
>>
>> No, I mean just send back a status message. Meaning that without
>> specifiying the sync modes per above, it would send back the *write*
>> location. This would be useful for tracking xlog filenames between
>> master and pg_receivexlog, without extra delay.
>>
>>>>> The patch needs more documentation. But I think that it's worth reviewing the
>>>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>>>
>>>> Looking at the code, what exactly prompts the changes to the backend
>>>> side? That seems unrelated? Are we actually considering picking a
>>>> standby with InvalidXlogRecPtr as a sync standby today?
>>>>
>>>> Isn't it enough to just send the proper write and flush locations from
>>>> the frontend?
>>>
>>> No, unless I'm missing something.
>>>
>>> The problem that we should address first is that the master can pick up
>>> pg_basebackup background process and pg_receivexlog as a sync
>>> standby even if they always return an invalid write and flush positions.
>>> Since they don't send any correct write and flush positions, if they are
>>> accidentally regarded as sync standby, all transactions can get blocked
>>> infinitely. So the patch needed to change the walsender code so that it
>>> doesn't pick up the remote server as sync one while its flush position
>>> is invalid.
>>
>> Yeah, that is clearly wrong. I think I missed this behaviour, and got
>> confused by the fact that the patch was trying to fix two different
>> things - only one of which I was aware of.
>>
>> So yes, per above, let's isolate out this part as one patch and get
>> that into 9.2, along with the "set the proper write location", but
>> leave everything else for 9.3.
>
> Agreed. The attached patch always sets the correct write location and
> prevents the remote server sending back invalid flush location from
> becoming sync standby.

Thanks, applied.

I also put the prevent invalid flush location from becoming a sync
standby part on 9.1 (required only a minor change -
GetXLogReplayRecPtr() didn't take an argument back then).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Akash Kodibail 2012-07-04 18:30:00 Delay in completion of aggregation inserts when run in a single commit - PG 9.1.2
Previous Message Magnus Hagander 2012-07-04 06:52:59 Re: webclient for postgresql

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-07-04 13:26:59 Re: pg_receivexlog and feedback message
Previous Message Joel Jacobson 2012-07-04 13:02:39 Re: Schema version management