Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size

From: Valentine Gogichashvili <valgog(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Date: 2012-08-16 15:05:11
Message-ID: CAP93muX=fbhOfjdYDvG8iJ0sPj4ptoHOV_mAqKE+9qvZmbUEiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello again,

now I have restarted the slave with shared_buffers set to 32GB, and now the
recovery process is at 100% CPU:

$ strace -c -f -p 27076
Process 27076 attached - interrupt to quit
Process 27076 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
67.65 0.002127 0 18127 lseek
28.98 0.000911 0 16496 getppid
2.58 0.000081 0 758 read
0.80 0.000025 4 6 1 open
0.00 0.000000 0 4 close
0.00 0.000000 0 1 stat
0.00 0.000000 0 493 493 unlink
------ ----------- ----------- --------- --------- ----------------
100.00 0.003144 35885 494 total

$ strace -c -f -p 27076
Process 27076 attached - interrupt to quit
Process 27076 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
44.44 0.000240 0 5976 lseek
38.15 0.000206 0 5522 getppid
17.41 0.000094 1 172 read
0.00 0.000000 0 4 1 open
0.00 0.000000 0 2 close
0.00 0.000000 0 1 stat
0.00 0.000000 0 159 159 unlink
------ ----------- ----------- --------- --------- ----------------
100.00 0.000540 11836 160 total

Then restarted again with shared_buffers set back to 2GB:

now the replay process is relatively fast again and here is the strace:

$ strace -c -f -p 36450
Process 36450 attached - interrupt to quit
Process 39336 attached
Process 39337 attached
Process 39337 detached
Process 39338 attached
Process 39338 detached
Process 39339 attached
Process 39339 detached
Process 39336 detached
Process 39359 attached
Process 39360 attached
Process 39360 detached
Process 39361 attached
Process 39361 detached
Process 39362 attached
Process 39362 detached
Process 39359 detached
Process 40168 attached
Process 40169 attached
Process 40169 detached
Process 40170 attached
Process 40170 detached
Process 40171 attached
Process 40171 detached
Process 40168 detached
Process 36450 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
88.21 0.668159 31817 21 9 wait4
4.69 0.035553 6 5970 read
1.92 0.014572 0 120565 lseek
1.91 0.014474 5 2957 2954 unlink
1.55 0.011766 8 1545 write
1.27 0.009582 0 109274 getppid
0.31 0.002340 3 762 fcntl
0.03 0.000245 20 12 clone
0.02 0.000163 1 219 rt_sigprocmask
0.02 0.000162 1 154 10 open
0.01 0.000092 0 216 mmap
0.01 0.000084 1 133 42 stat
0.01 0.000066 6 12 getrlimit
0.01 0.000046 0 178 close
0.00 0.000037 1 27 munmap
0.00 0.000035 6 6 pipe
0.00 0.000035 12 3 set_tid_address
0.00 0.000030 1 42 getegid
0.00 0.000021 0 183 rt_sigaction
0.00 0.000000 0 108 fstat
0.00 0.000000 0 102 mprotect
0.00 0.000000 0 36 brk
0.00 0.000000 0 9 rt_sigreturn
0.00 0.000000 0 9 9 ioctl
0.00 0.000000 0 48 12 access
0.00 0.000000 0 6 dup2
0.00 0.000000 0 6 getpid
0.00 0.000000 0 12 execve
0.00 0.000000 0 9 uname
0.00 0.000000 0 42 getuid
0.00 0.000000 0 42 getgid
0.00 0.000000 0 45 geteuid
0.00 0.000000 0 6 getpgrp
0.00 0.000000 0 6 statfs
0.00 0.000000 0 12 arch_prctl
0.00 0.000000 0 6 3 futex
0.00 0.000000 0 3 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.757462 242786 3039 total

With best regards,

-- Valentine Gogichashvili

On Thu, Aug 16, 2012 at 4:53 PM, Valentine Gogichashvili
<valgog(at)gmail(dot)com>wrote:

> Hello Andreas,
>
> here is the process, that now actually is not using CPU at all and the
> shared_buffers are set to 2GB:
>
> 50978 postgres 20 0 2288m 2.0g 2.0g S 0.0 1.6 4225:34 postgres:
> startup process recovering 000000050000262E000000FD
>
> It is hanging on that file for several minutes now.
>
> and here is the strace:
>
> $ strace -c -f -p 50978
> Process 50978 attached - interrupt to quit
> Process 50978 detached
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 94.82 0.007999 37 215 select
> 2.73 0.000230 1 215 getppid
> 2.45 0.000207 1 215 215 stat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.008436 645 215 total
>
> What kind of additional profiling information would you like to see?
>
> Regards,
>
> -- Valentin
>
>
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Valentine Gogichashvili 2012-08-16 15:13:47 Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Previous Message Valentine Gogichashvili 2012-08-16 14:53:19 Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size