Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date: 2012-07-01 17:14:25
Message-ID: CAHGQGwFfQA4X2M=vPkaoGfpN7t19=GhHZ0gFGHfy7_p=1Cka4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Thanks for the review!

On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>
>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>> opinion as well.
>>>>>
>>>>> +1
>>>>
>>>> Nobody else with any opinion on this? :(
>>>
>>> I don't think we really need a NOSYNC flag at this point. Just not
>>> setting the flush location in clients that make a point of flushing in
>>> a timely fashion seems fine.
>>
>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>> patch attached.
>>
>> In the patch, pg_basebackup background process and pg_receivexlog always
>> return invalid location as flush one, and will never become sync standby even
>> if their name is in synchronous_standby_names. The timing of their sending
>
> That doesn't match with the patch, afaics. The patch always sets the
> correct write location, which means it can become a remote_write
> synchronous standby, no? It will only send it back when timeout
> expires, but it will be sent back.

No. Though correct write location is sent back, they don't become sync standby
because flush location is always invalid. While flush location is
invalid, the master
will never regard the remote server as sync one even if synchronous_commit is
set to remote_write.

>
> I wonder if that might actually be a more reasonable mode of operation
> in general:
>
> * always send back the write position, at the write interval
> * always send back the flush position, when we're flushing (meaning
> when we switch xlog)
>
> have an option that makes it possible to:
> * always send back the write position as soon as it changes (making
> for a reasonable remote_write sync standby)
> * actually flush the log after each write instead of end of file
> (making for a reasonable full sync standby)
>
> meaning you'd have something like "pg_receivexlog --sync=write" and
> "pg_receivexlog --sync=flush" controlling it instead.

Yeah, in this way, pg_receivexlog can become sync even if
synchronous_commit is on, which seems more useful. But
I'm thinking that the synchronous pg_receivexlog stuff should
be postponed to 9.3 because its patch seems to become too
big to apply at this beta stage. So, in 9.2, to fix the problem,
what about just applying the simple patch which prevents
pg_basebackup background process and pg_receivexlog from
becoming sync standby whatever synchronous_standby_names
and synchronous_commit are set to?

> And deal with the "user put * in synchronous_standby_names and
> accidentally got pg_receivexlog as the sync standby" by more clearly
> warning people not to use * for that parameter... Since it's simply
> dangerous :)

Yep.

>> the reply depends on the standby_message_timeout specified in -s option. So
>> the write position may lag behind the true position.
>>
>> pg_receivexlog accepts new option -S (better option character?). If this option
>> is specified, pg_receivexlog returns true flush position, and can become sync
>> standby. It sends back the reply to the master each time the write position
>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>> synchronous replication to pg_receivexlog would work well.
>
> Yeah, I hadn't considered the remote_write mode, but I guess that's
> why you have to track the current write position across loads, which
> first confused me.

The patch has to track the current write location to decide whether to send
back the reply to the master, IOW to know whether the write location
has changed, IOW to know whether we've already sent the reply about
the latest write location.

> Looking at some other usecases for this, I wonder if we should also
> force a status message whenever we switch xlog files, even if we
> aren't running in sync mode, even if the timeout hasn't expired. I
> think that would be a reasonable thing to do, since you often want to
> track things based on files.

You mean that the pg_receivexlog should send back the correct flush
location whenever it switches xlog files?

>> The patch needs more documentation. But I think that it's worth reviewing the
>> code in advance, so I attached the WIP patch. Comments? Objections?
>
> Looking at the code, what exactly prompts the changes to the backend
> side? That seems unrelated? Are we actually considering picking a
> standby with InvalidXlogRecPtr as a sync standby today?
>
> Isn't it enough to just send the proper write and flush locations from
> the frontend?

No, unless I'm missing something.

The problem that we should address first is that the master can pick up
pg_basebackup background process and pg_receivexlog as a sync
standby even if they always return an invalid write and flush positions.
Since they don't send any correct write and flush positions, if they are
accidentally regarded as sync standby, all transactions can get blocked
infinitely. So the patch needed to change the walsender code so that it
doesn't pick up the remote server as sync one while its flush position
is invalid.

>> The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
>> we need to write the backport version of the patch for 9.2.
>
> Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah.

Yep.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Magnus Hagander 2012-07-01 19:01:12 Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Previous Message Christian Rosnes 2012-06-30 15:27:34 Re: Oldest xmin is far in the past

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-07-01 17:20:42 Re: XX000: enum value 117721 not found in cache for enum enumcrash
Previous Message Stefan Kaltenbrunner 2012-07-01 17:04:18 compiler warnings on the buildfarm