Re: pg_basebackup blocking all queries with horrible performance

From: Lonni J Friedman <netllama(at)gmail(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: pg_basebackup blocking all queries with horrible performance
Date: 2012-06-08 19:30:48
Message-ID: CAP=oouGZ4=B8fg6694urDoL-E7SsksOuTZGXmP_6UQ3T67vj5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>
>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>  wrote:
>>>
>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>
>>> if problem goes away.  SIGCONT and you should  start having
>>> sluggishness again.
>>>
>>> If verified, then any sort of throttling mechanism should work.
>>
>>
>> I'm certain that the problem is triggered only when pg_basebackup is
>> running.  Its very predictable, and goes away as soon as pg_basebackup
>> finishes running.  What do you mean by a throttling mechanism?
>
>
> Sure, it only happens when pg_basebackup is running. But if you *pause*
> pg_basebackup, so it's still running but not currently doing work, does the
> problem go away? Does it come back when you unpause pg_basebackup? That's
> what Jerry was telling you to try.
>
> If the problem goes away when you pause pg_basebackup and comes back when
> you unpause it, it's probably a system load problem.
>
> If it doesn't go away, it's more likely to be a locking issue or something
> _other_ than simple load.
>
> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
> it, so on Linux you can use these to try and find out. When you SIGSTOP
> pg_basebackup then the postgres backend associated with it should block
> shortly afterwards as its buffers fill up and it can't send more data, so
> the load should come off the server.
>
> A "throttling mechanism" refers to anything that limits the rate or speed of
> a thing. In this case, what you want to do if your problem is system
> overload is to limit the speed at which pg_basebackup does its work so other
> things can still get work done. In other words you want to throttle it.
> Typical throttling mechanisms include the "ionice" and "renice" commands to
> change I/O and CPU priority, respectively.
>
> Note that you may need to change the priority of the *backend* that
> pg_basebackup is using, not necessarily the pg_basebackup command its self.
> I haven't done enough with Pg's replication to know how that works, so
> someone else will have to fill that bit in.

Thanks for your reply. I've confirmed that issuing a SIGSTOP does
eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

I've looked at iostat output both before & during pg_basebackup runs,
and I'm not seeing any indication that the problem is due to disk IO
bottlenecks. The numbers don't vary very much at all between the good
& bad times. This is typical when pg_basebackup is running:
########
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
md0
0.00 0.00 67.76 68.62 4.42 1.46
88.34 0.00 0.00 0.00 0.00 0.00 0.00
########

and this is when the system is ok:
########
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
md0
0.00 0.00 68.04 68.56 4.44 1.46
88.39 0.00 0.00 0.00 0.00 0.00 0.00
########

I looked at vmstat output, but nothing is jumping out at me as being
dramatically different when pg_basebackup is running. swap in and
swap out are zero 100% of the time for the good & bad perf cases. I
can post example output if someone is interested, or if there's
something specific that I should be looking at as a potential problem,
let me know.

thanks

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Igor Shmain 2012-06-09 02:21:11 Re: Data split -- Creating a copy of database without outage
Previous Message René Romero Benavides 2012-06-08 19:30:32 Re: Question about PITR backup

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2012-06-09 01:41:57 Re: New Postgres committer: Kevin Grittner
Previous Message Robert Haas 2012-06-08 18:07:58 Re: log_newpage header comment