Stats collector on rampage (8.2.3)

From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Stats collector on rampage (8.2.3)
Date: 2007-10-29 08:52:24
Message-ID: 47259F48.1010107@bigfoot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
it seems that the stats collector on my box is using more CPU than
it did in the past.

This is what I'm observing:

CPU usage for the stat process: 25% flat

$ psql -c "select version()"
version
- -------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)
(1 row)

$ strace -tt -p 10773
[...]
09:47:37.867655 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
09:47:37.867738 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 823) = 823
09:47:37.867820 close(3) = 0
09:47:37.867862 munmap(0xb7ced000, 4096) = 0
09:47:37.867906 rename("global/pgstat.tmp", "global/pgstat.stat") = 0
09:47:37.868188 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868245 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976
09:47:37.868317 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 500}}, NULL) = 0
09:47:37.868372 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868428 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0g\n\0"..., 1000, 0) = 976
09:47:37.868501 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868559 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0i\n\0"..., 1000, 0) = 976
09:47:37.868629 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868687 recv(7, "\1\0\0\0\4\3\0\0\rg\0\0\v\0\0\0\0\0\0\0\0\0\0\0^\n\0\0"..., 1000, 0) = 772
09:47:37.868757 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868815 recv(7, "\1\0\0\0\240\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\354"..., 1000, 0) = 160
09:47:37.868886 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868944 recv(7, "\1\0\0\0\240\0\0\0\rg\0\0\2\0\0\0\1\0\0\0\0\0\0\0<\n\0"..., 1000, 0) = 160
09:47:37.869012 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869070 recv(7, "\1\0\0\0l\1\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\354\4\0"..., 1000, 0) = 364
09:47:37.869141 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869198 recv(7, "\1\0\0\0\300\2\0\0\rg\0\0\n\0\0\0\1\0\0\0\0\0\0\0007\n"..., 1000, 0) = 704
09:47:37.869267 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869329 recv(7, "\1\0\0\0(\1\0\0\rg\0\0\4\0\0\0\1\0\0\0\0\0\0\0/\n\0\0\0"..., 1000, 0) = 296
09:47:37.869398 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869456 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976
09:47:37.869524 --- SIGALRM (Alarm clock) @ 0 (0) ---
09:47:37.869575 sigreturn() = ? (mask now [])
09:47:37.869659 getppid() = 10768
09:47:37.869702 open("global/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
09:47:37.869775 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
09:47:37.869871 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ced000
09:47:37.869928 write(3, "\226\274\245\1D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
09:47:37.870252 write(3, ";\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[...]

and doing the statistics on the system calls:

$ time strace -c -p 10773
Process 10773 attached - interrupt to quit
Process 10773 detached
% time seconds usecs/call calls errors syscall
- ------ ----------- ----------- --------- --------- ----------------
62.70 4.980721 16 307851 write
25.72 2.043299 156 13058 3039 poll
9.61 0.763046 248 3078 rename
0.64 0.050992 17 3079 open
0.49 0.038819 4 10019 recv
0.26 0.020469 7 3078 munmap
0.13 0.010344 3 3078 close
0.12 0.009425 3 3079 mmap2
0.11 0.008353 3 3079 setitimer
0.09 0.007114 2 3079 3039 sigreturn
0.07 0.005923 2 3079 fstat64
0.06 0.004734 2 3079 getppid
- ------ ----------- ----------- --------- --------- ----------------
100.00 7.943239 358636 6078 total

real 0m16.313s
user 0m1.428s
sys 0m3.802s

so instead of 32 or such rename it did the rename 3K times.

To solve the problem is it possible to kill that process? (will it be respawned?)

Regards
Gaetano Mendola

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHJZ9I7UpzwH2SGd4RAqwrAJ9vyt5fd1rdEu+uTnef6QpFYVBbhwCePB69
b0jA7Ko85TyEfMqAmVVRy/w=
=ax+l
-----END PGP SIGNATURE-----

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-10-29 09:12:15 Re: Stats collector on rampage (8.2.3)
Previous Message ITAGAKI Takahiro 2007-10-29 02:50:41 Re: PANIC caused by open_sync on Linux