Re: Autovacuum daemon terminated by signal 11

From: Richard Huxton <dev(at)archonet(dot)com>
To: Justin Pasher <justinp(at)newmediagateway(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Autovacuum daemon terminated by signal 11
Date: 2009-01-15 09:47:41
Message-ID: 496F063D.1020308@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Justin Pasher wrote:
> Hello,
>
> I have a server running PostgreSQL 8.1.15-0etch1 (Debian etch) that was
> recently put into production. Last week a developer started having a problem
> with his psql connection being terminated every couple of minutes when he
> was running a query. When I look through the logs, I noticed this message.
>
> 2009-01-09 08:09:46 CST LOG: autovacuum process (PID 15012) was terminated
> by signal 11

Segmentation fault - probably a bug or bad RAM.

> I looked through the logs some more and I noticed that this was occurring
> every minute or so. The database is a pretty heavily utilized system
> (judging by the age(datfrozenxid) from pg_database, the system had run
> approximately 500 million queries in less than a week). I noticed that right
> before every autovacuum termination, it tried to autovacuum a database.
>
> 2009-01-09 08:09:46 CST LOG: transaction ID wrap limit is 4563352, limited
> by database "database_name"
>
> It was always showing the same database, so I decided to manually vacuum the
> database. Once that was done (it was successful the first time without
> errors), the problem seemed to go away. I went ahead and manually vacuumed
> the remaining databases just to take care of the potential xid wraparound
> issue.

I'd be suspicious of possible corruption in autovacuum's internal data.
Can you trace these problems back to a power-outage or system crash? It
doesn't look like "database_name" itself since you vacuumed that
successfully. If autovacuum is running normally now, that might indicate
it was something in the way autovacuum was keeping track of "database_name".

It's also probably worth running some memory tests on the server -
(memtest86 or similar) to see if that shows anything. Was it *always*
the autovacuum process getting sig11? If not then it might just be a
pattern of usage that makes it more likely to use some bad RAM.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christian Schröder 2009-01-15 09:57:02 Re: Polymorphic "setof record" function?
Previous Message Dave Page 2009-01-15 09:44:09 Re: one-click installer postgresql-8.3.5-1-linux.bin failed

Browse pgsql-hackers by date

  From Date Subject
Next Message Jasen Betts 2009-01-15 10:14:10 Re: fire trigger for a row without update?
Previous Message Marcus Kempe 2009-01-15 09:19:56 Re: async notification patch for dblink