Re: Autovacuum daemon terminated by signal 11

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pasher <justinp(at)newmediagateway(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Autovacuum daemon terminated by signal 11
Date: 2009-01-16 23:43:09
Message-ID: 15221.1232149389@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I wrote:
> ... and you've seemingly not managed to install the debug symbols where
> gdb can find them.

But never mind that --- it turns out to be trivial to reproduce the
crash. Just create a database, set its datfrozenxid and datvacuumxid
far in the past (via a manual update of pg_database), enable autovacuum,
and wait a bit.

What is happening is that autovacuum_do_vac_analyze contains

old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
...
vacuum(vacstmt, relids);
...
MemoryContextSwitchTo(old_cxt);

and at the time it is called by process_whole_db, CurrentMemoryContext
points at TopTransactionContext. Which gets destroyed because vacuum()
internally finishes that transaction and starts a new one. When we
come out of vacuum(), CurrentMemoryContext again points at
TopTransactionContext, but *its not the same one*. The closing
MemoryContextSwitchTo is installing a stale pointer, which then remains
active into CommitTransaction. It's a wonder this code ever works.

The other path through do_autovacuum() escapes this fate because it
enters autovacuum_do_vac_analyze with CurrentMemoryContext pointing
at AutovacMemCxt, which isn't going to go away.

I argue that autovacuum_do_vac_analyze shouldn't attempt to restore the
caller's memory context at all. One possible approach is to make it
re-select AutovacMemCxt at exit, but I wonder if we shouldn't define
its entry and exit conditions as current context being
(the current instance of) TopTransactionContext.

It looks like 8.3 and HEAD take the latter approach and are therefore
safe from this bug. 8.2 seems to escape it also because it doesn't have
process_whole_db anymore, but it's certainly not
autovacuum_do_vac_analyze fault that it's not broken, because it's still
trying to restore a context that it has no right to assume still exists.

Alvaro, you want to take charge of fixing this?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Erik Jones 2009-01-16 23:58:20 Re: Inheritance question
Previous Message Justin Pasher 2009-01-16 23:29:35 Re: Autovacuum daemon terminated by signal 11

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-01-17 00:01:15 Re: WIP: Automatic view update rules
Previous Message Justin Pasher 2009-01-16 23:29:35 Re: Autovacuum daemon terminated by signal 11