Skip site navigation (1) Skip section navigation (2)

Re: rebellious pg stats collector (reopened case)

From: Laszlo Nagy <gandalf(at)shopzeus(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: rebellious pg stats collector (reopened case)
Date: 2008-12-19 08:54:36
Message-ID: 494B614C.5060808@shopzeus.com (view raw or flat)
Thread:
Lists: pgsql-adminpgsql-performance
Alvaro Herrera wrote:
> Laszlo Nagy wrote:
>   
>> PostgreSQL 8.3.5, the system is now stable (uptime > 10 days).  
>> PostgreSQL stats collector uses 100% CPU forever:
>>     
>
> Could you grab a few stack traces from it and post them?  Also possibly
> useful, leave strace running on the pgstat process for a while and post
> the output somewhere.
>   
Very interesting results. Before I used 'truss' on the process, it was:

78816 pgsql            1 106    0 22400K  7100K CPU6   6  24.2H 99.02% 
postgres

After I started "truss -p 78815" the first message I got was:

SIGNAL 17 (SIGSTOP)

and the it waited for some seconds. Then my screen was filled with 
messages like this:

poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M^X\^A\0\0\M^T\f\r\0"...,1000,0x0,NULL,0x0) = 408 
(0x198)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\b\0\0\0(at)\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 
(0x40)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\b\0\0\0(at)\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 
(0x40)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^O\v\b\0\^B"...,1000,0x0,NULL,0x0) = 
216 (0xd8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 
(0x1f8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^T\f\r\0\^B"...,1000,0x0,NULL,0x0) = 
216 (0xd8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 
(0x1f8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^T\f\r\0\^B"...,1000,0x0,NULL,0x0) = 
216 (0xd8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 
(0x1f8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\b\0\0\0(at)\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 
(0x40)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)

I terminated the debugger with Ctrl+C. Then I started it again:


#truss -p 78815
SIGNAL 17 (SIGSTOP)
poll({7/POLLIN|POLLERR},1,2000)                  = 0 (0x0)
getppid(0x1,0x1,0x7d0,0x8014606cc,0xffffffff80a579c0,0x7fffffffceb8) = 
95860 (0x17674)
open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4)
fstat(4,{mode=-rw------- ,inode=73223196,size=0,blksize=4096}) = 0 (0x0)
write(4,"\M^W\M-<\M-%\^A\^F\n\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-Z\^C\0\0\0\0\0\0\M-d\M^E\M-0u"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0+\0\0\0\0\0\0\0\^N\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^R\M-(\M^[\v\M-^\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\^N\0\0\0\0\0\0\0\\n\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0~\M-d\M-fp\n\M-^\M-0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"$\0\0\0\0\0\0+\^V\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0Z\M-F%"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0|\^B\0\0\0\0\0\0<\M-I\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0T\M-f\t\^V\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\M-H'\M-4\M-N\n\M-^\M-0A"...,4096) = 4096 (0x1000)
write(4,"\0\0\^N\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\b\0\0\0\0\0L\^B\0\0\0\0\0\0\M^^"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0dD\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\09\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0>\M-h\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T%;\r\0\0\0\0\0 \0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-I\\j\n\M-^\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0Y\240\^A\0\0\0\0\0\^C\M-3"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\^N\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0P\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^P\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\M-P\^R\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,3119) = 3119 (0xc2f)
close(4)                     = 0 (0x0)
rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0)
process exit, rval = 0

I did NOT send the SIGSTOP signal to the process. The pgstat collector 
was running at 100%CPU for 24 hours, when I attached the debugger, and 
attaching the debugger somehow caused the process to get a SIGSTOP, but 
the process did not exit. When I started the debugger for the second 
time, it got a SIGSTOP again, and this made the collector exiting the 
endless loop (instead of stopping it).

What is the next step?

   Laszlo


In response to

Responses

pgsql-performance by date

Next:From: Alvaro HerreraDate: 2008-12-19 13:04:46
Subject: Re: rebellious pg stats collector (reopened case)
Previous:From: sathiya psqlDate: 2008-12-19 08:00:36
Subject: Re: pg_restore : out of memory

pgsql-admin by date

Next:From: Alvaro HerreraDate: 2008-12-19 13:04:46
Subject: Re: rebellious pg stats collector (reopened case)
Previous:From: Alvaro HerreraDate: 2008-12-19 03:09:59
Subject: Re: reindexdb dying with SIGPIPE on 8.2.5

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group