Skip site navigation (1) Skip section navigation (2)

Re: Autovacuum seems to block database: WARNING worker took too long to start

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pablo Delgado Díaz-Pache <delgadop(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Autovacuum seems to block database: WARNING worker took too long to start
Date: 2010-11-18 14:49:18
Message-ID: 1290091390-sup-7218@alvh.no-ip.org (view raw or flat)
Thread:
Lists: pgsql-admin
Excerpts from Pablo Delgado Díaz-Pache's message of jue nov 18 08:57:16 -0300 2010:

> 2) We did a strace to the postmaster pid. However we had 2 postmasters not
> dead
> 
> # ps -fea |grep -i postmaster
> postgres  3889     1  0 Nov16 ?        00:01:24 /usr/bin/postmaster -p 5432
> -D /var/lib/pgsql/data
> postgres  7601  3889  0 12:37 ?        00:00:00 /usr/bin/postmaster -p 5432
> -D /var/lib/pgsql/data
> 
> As soon as we did a "strace" to the 3889 pid everything started to work
> again.

Sorry for my previous response -- evidently I failed to scroll down
enough to notice this part.

It seems to me that this process was stuck in a unnatural way.

> Not sure it was a coincidence but that was the way it was.
> 
> *# strace -p 3889*
> *Process 3889 attached - interrupt to quit*
> *select(6, [3 4 5], NULL, NULL, {56, 930000}) = ? ERESTARTNOHAND (To be
> restarted)*
> *--- SIGUSR1 (User defined signal 1) @ 0 (0) ---*
> *rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN
> RT_1], NULL, 8) = 0*

This seems normal postmaster activity: receiving SIGUSR1, then SIGCHLD,
and doing stuff accordingly.

Rather than a coincidence, I would think that the act of tracing it made
it come back to life.  A kernel bug maybe?  Have you upgraded your
kernel or libc lately?

> I also straced the other postmaster pid
> 
> *# strace -p 7601*
> *Process 7601 attached - interrupt to quit*
> *recvfrom(8, "P\0\0\0\221\0select id_key from transla"..., 8192, 0, NULL,
> NULL) = 181*

This one seems like a regular postmaster child that hadn't gotten around
to changing its ps status yet.  (Note it had PPID 3889 which is
consistent with this idea.)

-- 
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

pgsql-admin by date

Next:From: ojas dubeyDate: 2010-11-18 15:26:38
Subject: Re: Find all running postgres DB servers on a network
Previous:From: Alvaro HerreraDate: 2010-11-18 14:40:53
Subject: Re: Autovacuum seems to block database: WARNING worker took too long to start

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group