Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size

From: Valentine Gogichashvili <valgog(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Date: 2012-08-16 14:53:19
Message-ID: CAP93muXCLBBnHuWrbr8Lh6tNTFNVYRTp9VRTSApMK1UY+QpYYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Andreas,

here is the process, that now actually is not using CPU at all and the
shared_buffers are set to 2GB:

50978 postgres 20 0 2288m 2.0g 2.0g S 0.0 1.6 4225:34 postgres:
startup process recovering 000000050000262E000000FD

It is hanging on that file for several minutes now.

and here is the strace:

$ strace -c -f -p 50978
Process 50978 attached - interrupt to quit
Process 50978 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
94.82 0.007999 37 215 select
2.73 0.000230 1 215 getppid
2.45 0.000207 1 215 215 stat
------ ----------- ----------- --------- --------- ----------------
100.00 0.008436 645 215 total

What kind of additional profiling information would you like to see?

Regards,

-- Valentin

On Wed, Aug 15, 2012 at 4:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Hi,
>
> On Wednesday, August 15, 2012 12:10:42 PM valgog(at)gmail(dot)com wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 7494
> > Logged by: Valentine Gogichashvili
> > Email address: valgog(at)gmail(dot)com
> > PostgreSQL version: 9.0.7
> > Operating system: Linux version 2.6.32-5-amd64 (Debian 2.6.32-41)
> > Description:
> >
> > We are experiencing strange(?) behavior on the replication slave
> machines.
> > The master machine has a very heavy update load, where many processes are
> > updating lots of data. It generates up to 30GB of WAL files per hour.
> > Normally it is not a problem for the slave machines to replay this amount
> > of WAL files on time and keep on with the master. But at some moments,
> the
> > slaves are “hanging” with 100% CPU usage on the WAL replay process and 3%
> > IOWait, needing up to 30 seconds to process one WAL file. If this tipping
> > point is reached, then a huge WAL replication lag is building up quite
> > fast, that also leads to overfill of the XLOG directory on the slave
> > machines, as the WAL receiver is putting the WAL files it gets via
> > streaming replication the XLOG directory (that, in many cases are quite a
> > limited size separate disk partition).
> Could you try to get a profile of that 100% cpu time?
>
> Greetings,
>
> Andres
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Valentine Gogichashvili 2012-08-16 15:05:11 Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Previous Message Dave Page 2012-08-16 14:29:59 Re: BUG #7485: 9.2 beta3 libxml2 can't be loaded on Windows