Re: rebellious pg stats collector (reopened case)

From: Laszlo Nagy <gandalf(at)shopzeus(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: rebellious pg stats collector (reopened case)
Date: 2008-12-22 10:11:30
Message-ID: 494F67D2.3090406@shopzeus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

OK, here is the new test.

shopzeus# ps l -p 39766 -p 39767 -p 39769
UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
70 39766 78806 0 96 0 451960 423896 select Ss ?? 8:41.85
postgres: writer process (postgres)
70 39767 78806 0 96 0 451960 7184 select Ss ?? 0:58.75
postgres: wal writer process (postgres)
70 39769 78806 0 96 0 22400 7008 select Ss ?? 14:42.39
postgres: stats collector process (postgres)

top -otime 10

last pid: 7984; load averages: 1.38, 1.51,
1.49
up 16+18:18:22 05:02:20
170 processes: 3 running, 161 sleeping, 1 stopped, 5 zombie
CPU states: % user, % nice, % system, % interrupt, %
idle
Mem: 959M Active, 5947M Inact, 562M Wired, 275M Cache, 214M Buf, 164M Free
Swap: 16G Total, 1088K Used, 16G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
78806 pgsql 1 98 0 441M 6960K select 4 67:14 0.39%
postgres
39769 pgsql 1 96 0 22400K 7008K select 7 14:43 0.00%
postgres
39766 pgsql 1 96 0 441M 414M select 3 8:42 0.00%
postgres
10602 root 1 96 0 149M 18916K select 4 5:30 0.00% httpd
67535 root 1 4 0 3720K 1336K kqread 0 3:54 0.00%
dovecot
795 root 1 4 0 8188K 1452K kqread 4 3:21 0.00%
master
95791 shopzeus 3 116 20 59892K 17440K select 3 2:22 0.00%
python
32397 postfix 1 4 0 8280K 1980K kqread 6 1:23 0.00% qmgr
67536 root 1 4 0 5996K 1448K kqread 4 1:07 0.00%
dovecot-auth
525 root 1 96 0 4684K 1156K select 3 1:03 0.00%
syslogd

In other words, the stats collector is running since 14 hours.

shopzeus# date
Mon Dec 22 05:03:24 EST 2008
shopzeus# ls -l ~pgsql/data/global/pgstat.stat
-rw------- 1 pgsql pgsql 232084 Dec 22 05:03
/usr/local/pgsql/data/global/pgstat.stat

Looks like it is updating the stats file continuously. Now I try to trace:

shopzeus# truss -p 39769
poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) =
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) =
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) =
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1)
recvfrom(7,"\^A\0\0\0X\^B\0\0\M^T\f\r\0\^F\0"...,1000,0x0,NULL,0x0) =
600 (0x258)
poll({7/POLLIN|POLLERR},1,2000) ERR#4 'Interrupted
system call'
SIGNAL 14 (SIGALRM)
sigreturn(0x7fffffffca70,0x0,0x7fffffffca70,0xffffffff809fb5c0,0x7fffffffffc0,0x7fffffffca60)
ERR#4 'Interrupted system call'
getppid(0x1,0x1,0x801901120,0x4,0x7fffffffd39f,0x6) = 11204 (0x2bc4)
open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4)
fstat(4,{mode=-rw------- ,inode=73223197,size=0,blksize=4096}) = 0 (0x0)
write(4,"\M^W\M-<\M-%\^A`\^B\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"l\0\0\0\0\0\0\0\M-T\M-s.\M-`\M-r"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\t\0\0\0\0\0\0\0\^C"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-aKo\M-s\M-a\M-0A\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\^C\0\0\0\0\0\0\0\M-&"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\M-_o\M-T\M-u\M-p"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^A\0\0\0\0\0\0\^?\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0X\M-<"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\b\^D(at)f\0\0\0\0\0\0\0F \0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0T\M-2\f\b\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0;\0\0\0\0\0\0\0\^R\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T_\n\0\0\0\0\0\0\M-*R\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"r\M^W\M-`\M-p\M-a\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-a\M-p\M-a\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0c\^Q\^A\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\^F\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0Z\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\^D\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,2708) = 2708 (0xa94)
close(4) = 0 (0x0)
rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0)
process exit, rval = 0

And the stats collector exited! It cannot be by accident. Last time I
checked, it had the same problem: tried to poll something. It was in an
inifinte loop, polling something but always got POLERR. Then because I
started to trace it, it exited with rval=0 or rval=1.

I do not understand why it exited after I started to trace it? Is this
some kind of bug?

Thanks,

Laszlo

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Alvaro Herrera 2008-12-22 12:28:11 publishing changelogs on pgweb
Previous Message Alvaro Herrera 2008-12-22 00:46:48 Re: reindexdb dying with SIGPIPE on 8.2.5

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2008-12-22 10:35:35 Re: dbt-2 tuning results with postgresql-8.3.5
Previous Message Greg Smith 2008-12-22 08:59:42 Re: dbt-2 tuning results with postgresql-8.3.5