Skip site navigation (1) Skip section navigation (2)

Re: rebellious pg stats collector (reopened case)

From: Laszlo Nagy <gandalf(at)shopzeus(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: rebellious pg stats collector (reopened case)
Date: 2008-12-22 10:11:30
Message-ID: 494F67D2.3090406@shopzeus.com (view raw or flat)
Thread:
Lists: pgsql-adminpgsql-performance
OK, here is the new test.

shopzeus# ps l -p 39766 -p 39767 -p 39769
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
   70 39766 78806   0  96  0 451960 423896 select Ss    ??    8:41.85 
postgres: writer process    (postgres)
   70 39767 78806   0  96  0 451960  7184 select Ss    ??    0:58.75 
postgres: wal writer process    (postgres)
   70 39769 78806   0  96  0 22400  7008 select Ss    ??   14:42.39 
postgres: stats collector process    (postgres)


top -otime 10


last pid:  7984;  load averages:  1.38,  1.51,  
1.49                                                                                
up 16+18:18:22  05:02:20
170 processes: 3 running, 161 sleeping, 1 stopped, 5 zombie
CPU states:     % user,     % nice,     % system,     % interrupt,     % 
idle
Mem: 959M Active, 5947M Inact, 562M Wired, 275M Cache, 214M Buf, 164M Free
Swap: 16G Total, 1088K Used, 16G Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU 
COMMAND
78806 pgsql            1  98    0   441M  6960K select 4  67:14  0.39% 
postgres
39769 pgsql            1  96    0 22400K  7008K select 7  14:43  0.00% 
postgres
39766 pgsql            1  96    0   441M   414M select 3   8:42  0.00% 
postgres
10602 root             1  96    0   149M 18916K select 4   5:30  0.00% httpd
67535 root             1   4    0  3720K  1336K kqread 0   3:54  0.00% 
dovecot
  795 root             1   4    0  8188K  1452K kqread 4   3:21  0.00% 
master
95791 shopzeus         3 116   20 59892K 17440K select 3   2:22  0.00% 
python
32397 postfix          1   4    0  8280K  1980K kqread 6   1:23  0.00% qmgr
67536 root             1   4    0  5996K  1448K kqread 4   1:07  0.00% 
dovecot-auth
  525 root             1  96    0  4684K  1156K select 3   1:03  0.00% 
syslogd

In other words, the stats collector is running since 14 hours.

shopzeus# date
Mon Dec 22 05:03:24 EST 2008
shopzeus# ls -l ~pgsql/data/global/pgstat.stat
-rw-------  1 pgsql  pgsql  232084 Dec 22 05:03 
/usr/local/pgsql/data/global/pgstat.stat

Looks like it is updating the stats file continuously. Now I try to trace:


shopzeus# truss -p 39769
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0X\^B\0\0\M^T\f\r\0\^F\0"...,1000,0x0,NULL,0x0) = 
600 (0x258)
poll({7/POLLIN|POLLERR},1,2000)                  ERR#4 'Interrupted 
system call'
SIGNAL 14 (SIGALRM)
sigreturn(0x7fffffffca70,0x0,0x7fffffffca70,0xffffffff809fb5c0,0x7fffffffffc0,0x7fffffffca60) 
ERR#4 'Interrupted system call'
getppid(0x1,0x1,0x801901120,0x4,0x7fffffffd39f,0x6) = 11204 (0x2bc4)
open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4)
fstat(4,{mode=-rw------- ,inode=73223197,size=0,blksize=4096}) = 0 (0x0)
write(4,"\M^W\M-<\M-%\^A`\^B\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"l\0\0\0\0\0\0\0\M-T\M-s.\M-`\M-r"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\t\0\0\0\0\0\0\0\^C"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-aKo\M-s\M-a\M-0A\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\^C\0\0\0\0\0\0\0\M-&"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\M-_o\M-T\M-u\M-p"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^A\0\0\0\0\0\0\^?\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0X\M-<"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\b\^D(at)f\0\0\0\0\0\0\0F \0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0T\M-2\f\b\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0;\0\0\0\0\0\0\0\^R\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T_\n\0\0\0\0\0\0\M-*R\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"r\M^W\M-`\M-p\M-a\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-a\M-p\M-a\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0c\^Q\^A\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\^F\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0Z\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\^D\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,2708) = 2708 (0xa94)
close(4)                     = 0 (0x0)
rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0)
process exit, rval = 0

And the stats collector exited! It cannot be by accident. Last time I 
checked, it had the same problem: tried to poll something. It was in an 
inifinte loop, polling something but always got POLERR. Then because I 
started to trace it, it exited with rval=0 or rval=1.

I do not understand why it exited after I started to trace it? Is this 
some kind of bug?

Thanks,

   Laszlo


In response to

Responses

pgsql-performance by date

Next:From: Gregory StarkDate: 2008-12-22 10:35:35
Subject: Re: dbt-2 tuning results with postgresql-8.3.5
Previous:From: Greg SmithDate: 2008-12-22 08:59:42
Subject: Re: dbt-2 tuning results with postgresql-8.3.5

pgsql-admin by date

Next:From: Alvaro HerreraDate: 2008-12-22 12:28:11
Subject: publishing changelogs on pgweb
Previous:From: Alvaro HerreraDate: 2008-12-22 00:46:48
Subject: Re: reindexdb dying with SIGPIPE on 8.2.5

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group