Re: Autovacuum seems to block database: WARNING worker took too long to start

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pablo Delgado Díaz-Pache <delgadop(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Autovacuum seems to block database: WARNING worker took too long to start
Date: 2010-11-18 14:49:18
Message-ID: 1290091390-sup-7218@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Excerpts from Pablo Delgado Díaz-Pache's message of jue nov 18 08:57:16 -0300 2010:

> 2) We did a strace to the postmaster pid. However we had 2 postmasters not
> dead
>
> # ps -fea |grep -i postmaster
> postgres 3889 1 0 Nov16 ? 00:01:24 /usr/bin/postmaster -p 5432
> -D /var/lib/pgsql/data
> postgres 7601 3889 0 12:37 ? 00:00:00 /usr/bin/postmaster -p 5432
> -D /var/lib/pgsql/data
>
> As soon as we did a "strace" to the 3889 pid everything started to work
> again.

Sorry for my previous response -- evidently I failed to scroll down
enough to notice this part.

It seems to me that this process was stuck in a unnatural way.

> Not sure it was a coincidence but that was the way it was.
>
> *# strace -p 3889*
> *Process 3889 attached - interrupt to quit*
> *select(6, [3 4 5], NULL, NULL, {56, 930000}) = ? ERESTARTNOHAND (To be
> restarted)*
> *--- SIGUSR1 (User defined signal 1) @ 0 (0) ---*
> *rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN
> RT_1], NULL, 8) = 0*

This seems normal postmaster activity: receiving SIGUSR1, then SIGCHLD,
and doing stuff accordingly.

Rather than a coincidence, I would think that the act of tracing it made
it come back to life. A kernel bug maybe? Have you upgraded your
kernel or libc lately?

> I also straced the other postmaster pid
>
> *# strace -p 7601*
> *Process 7601 attached - interrupt to quit*
> *recvfrom(8, "P\0\0\0\221\0select id_key from transla"..., 8192, 0, NULL,
> NULL) = 181*

This one seems like a regular postmaster child that hadn't gotten around
to changing its ps status yet. (Note it had PPID 3889 which is
consistent with this idea.)

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message ojas dubey 2010-11-18 15:26:38 Re: Find all running postgres DB servers on a network
Previous Message Alvaro Herrera 2010-11-18 14:40:53 Re: Autovacuum seems to block database: WARNING worker took too long to start