Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date: 2012-07-01 19:01:12
Message-ID: CABUevEx7xWv8pFVWTuA2WM157AEWPB5Gkc=p6BhEytJE3aEa8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>
>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>> opinion as well.
>>>>>>
>>>>>> +1
>>>>>
>>>>> Nobody else with any opinion on this? :(
>>>>
>>>> I don't think we really need a NOSYNC flag at this point.  Just not
>>>> setting the flush location in clients that make a point of flushing in
>>>> a timely fashion seems fine.
>>>
>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>> patch attached.
>>>
>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>> return invalid location as flush one, and will never become sync standby even
>>> if their name is in synchronous_standby_names. The timing of their sending
>>
>> That doesn't match with the patch, afaics. The patch always sets the
>> correct write location, which means it can become a remote_write
>> synchronous standby, no? It will only send it back when timeout
>> expires, but it will be sent back.
>
> No. Though correct write location is sent back, they don't become sync standby
> because flush location is always invalid. While flush location is
> invalid, the master
> will never regard the remote server as sync one even if synchronous_commit is
> set to remote_write.

Oh. I wasn't aware of that part.

>> I wonder if that might actually be a more reasonable mode of operation
>> in general:
>>
>> * always send back the write position, at the write interval
>> * always send back the flush position, when we're flushing (meaning
>> when we switch xlog)
>>
>> have an option that makes it possible to:
>> * always send back the write position as soon as it changes (making
>> for a reasonable remote_write sync standby)
>> * actually flush the log after each write instead of end of file
>> (making for a reasonable full sync standby)
>>
>> meaning you'd have something like "pg_receivexlog --sync=write" and
>> "pg_receivexlog --sync=flush" controlling it instead.
>
> Yeah, in this way, pg_receivexlog can become sync even if
> synchronous_commit is on, which seems more useful. But
> I'm thinking that the synchronous pg_receivexlog stuff should
> be postponed to 9.3 because its patch seems to become too
> big to apply at this beta stage. So, in 9.2, to fix the problem,
> what about just applying the simple patch which prevents
> pg_basebackup background process and pg_receivexlog from
> becoming sync standby whatever synchronous_standby_names
> and synchronous_commit are set to?

Agreed.

With the addition that we should set the write location, because
that's very useful and per what you said above should be perfectly
safe.

>> And deal with the "user put * in synchronous_standby_names and
>> accidentally got pg_receivexlog as the sync standby" by more clearly
>> warning people not to use * for that parameter... Since it's simply
>> dangerous :)
>
> Yep.

What would be good wording? Something along the line of "Using the *
entry is not recommended since it can lead to unexpected results when
new standbys are added" or something like that?

>>> the reply depends on the standby_message_timeout specified in -s option. So
>>> the write position may lag behind the true position.
>>>
>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>> standby. It sends back the reply to the master each time the write position
>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>> synchronous replication to pg_receivexlog would work well.
>>
>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>> why you have to track the current write position across loads, which
>> first confused me.
>
> The patch has to track the current write location to decide whether to send
> back the reply to the master, IOW to know whether the write location
> has changed, IOW to know whether we've already sent the reply about
> the latest write location.

Yeha, makes perfect sense.

>> Looking at some other usecases for this, I wonder if we should also
>> force a status message whenever we switch xlog files, even if we
>> aren't running in sync mode, even if the timeout hasn't expired. I
>> think that would be a reasonable thing to do, since you often want to
>> track things based on files.
>
> You mean that the pg_receivexlog should send back the correct flush
> location whenever it switches xlog files?

No, I mean just send back a status message. Meaning that without
specifiying the sync modes per above, it would send back the *write*
location. This would be useful for tracking xlog filenames between
master and pg_receivexlog, without extra delay.

>>> The patch needs more documentation. But I think that it's worth reviewing the
>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>
>> Looking at the code, what exactly prompts the changes to the backend
>> side? That seems unrelated? Are we actually considering picking a
>> standby with InvalidXlogRecPtr as a sync standby today?
>>
>> Isn't it enough to just send the proper write and flush locations from
>> the frontend?
>
> No, unless I'm missing something.
>
> The problem that we should address first is that the master can pick up
> pg_basebackup background process and pg_receivexlog as a sync
> standby even if they always return an invalid write and flush positions.
> Since they don't send any correct write and flush positions, if they are
> accidentally regarded as sync standby, all transactions can get blocked
> infinitely. So the patch needed to change the walsender code so that it
> doesn't pick up the remote server as sync one while its flush position
> is invalid.

Yeah, that is clearly wrong. I think I missed this behaviour, and got
confused by the fact that the patch was trying to fix two different
things - only one of which I was aware of.

So yes, per above, let's isolate out this part as one patch and get
that into 9.2, along with the "set the proper write location", but
leave everything else for 9.3.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Fujii Masao 2012-07-02 18:17:16 Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Previous Message Fujii Masao 2012-07-01 17:14:25 Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

Browse pgsql-hackers by date

  From Date Subject
Next Message Nils Goroll 2012-07-01 19:02:05 Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
Previous Message Robert Haas 2012-07-01 18:50:04 Re: new --maintenance-db options