Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Valentine Gogichashvili <valgog(at)gmail(dot)com>
Subject: Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Date: 2012-08-17 07:31:42
Message-ID: 201208170931.43034.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Valentine,

On Thursday, August 16, 2012 05:05:11 PM Valentine Gogichashvili wrote:
> Hello again,
>
> now I have restarted the slave with shared_buffers set to 32GB, and now the
> recovery process is at 100% CPU:

> $ strace -c -f -p 27076
> Process 27076 attached - interrupt to quit
> Process 27076 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 44.44 0.000240 0 5976 lseek
> 38.15 0.000206 0 5522 getppid
> 17.41 0.000094 1 172 read
> 0.00 0.000000 0 4 1 open
> 0.00 0.000000 0 2 close
> 0.00 0.000000 0 1 stat
> 0.00 0.000000 0 159 159 unlink
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.000540 11836 160 total

> $ strace -c -f -p 36450
> Process 36450 attached - interrupt to quit
> Process 39336 attached
> Process 39337 attached
> Process 39337 detached
> Process 39338 attached
> Process 39338 detached
> Process 39339 attached
> Process 39339 detached
> Process 39336 detached
> Process 39359 attached
> Process 39360 attached
> Process 39360 detached
> Process 39361 attached
> Process 39361 detached
> Process 39362 attached
> Process 39362 detached
> Process 39359 detached
> Process 40168 attached
> Process 40169 attached
> Process 40169 detached
> Process 40170 attached
> Process 40170 detached
> Process 40171 attached
> Process 40171 detached
> Process 40168 detached
> Process 36450 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 88.21 0.668159 31817 21 9 wait4
> 4.69 0.035553 6 5970 read
> 1.92 0.014572 0 120565 lseek
> 1.91 0.014474 5 2957 2954 unlink
> 1.55 0.011766 8 1545 write
> 1.27 0.009582 0 109274 getppid
> 0.31 0.002340 3 762 fcntl
> 0.03 0.000245 20 12 clone
> 0.02 0.000163 1 219 rt_sigprocmask
> 0.02 0.000162 1 154 10 open
> 0.01 0.000092 0 216 mmap
> 0.01 0.000084 1 133 42 stat
> 0.01 0.000066 6 12 getrlimit
> 0.01 0.000046 0 178 close
>
Those straces don't indicate contention in the kernel to me.

Could you get a profile with 'perf' or if youre more familiar with it with
'oprofile'. With the former its just something like:
perf record -F 10000 -p 27076
perf report > somefile

And then send in somefile.

Greetings,

Andres

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2012-08-17 11:07:16 Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Previous Message Bruce Momjian 2012-08-17 01:32:03 Re: BUG #6067: In PL/pgsql, EXISTS(SELECT ... INTO...) fails