Deadlock or other hang while vacuuming?

From: Craig Ruff <cruff(at)ucar(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Deadlock or other hang while vacuuming?
Date: 2004-11-08 18:26:15
Message-ID: 20041108182615.GA5121@ucar.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I have an annoying problem with some kind of a hang or deadlock triggered
sometimes when running a vacuum on a table with a pair of read only
cursors enumerating different subsets of rows of the same table.
(Originally, I also could have other queries that modified the table
running concurrently, but I prevented them from starting while the vacuum
query was running to narrow down the scope of the problem.)

I'm running PostgreSQL-7.4.5 on an SGI MIPS system running IRIX 6.5.24f,
compiled with the MIPSpro 7.4.2m C compiler. The application driving
the database is multithreaded, and can have numerous sessions open to
backends. I did make sure to verify I had compiled PostgreSQL with
threading enabled.

The table contains approximately 570,000 to 600,000 entries, and is defined
thusly:

CREATE TABLE seg (
id
serial8
PRIMARY KEY,
name
varchar(20)
NOT NULL,
lv_id
int4
NOT NULL
REFERENCES lv(id),
size
int8
NOT NULL
CHECK (size >= 0),
creation_time
timestamp
NOT NULL,
last_use_time
timestamp
DEFAULT timestamp 'epoch'
NOT NULL,
UNIQUE(lv_id, name)
) WITHOUT OIDS;

The enumeration sessions take a while, as the client system driving them
is slow. Each enumeration session has an exclusive backend connection,
and takes place inside a transaction. An example sequence of events
looks like this:

BEGIN;
DECLARE lsess CURSOR FOR
SELECT name, size, to_char(creation_time, 'YY.DDD'),
to_char(last_use_time, 'YY.DDD')
FROM seg WHERE lv_id = 12 AND name ~ '^M*';

(wait for a request for the next batch):

FETCH 60 FROM lsess;

(repeat as necessary)

CLOSE lsess;
COMMIT;

I have a periodic task which kicks off vacuums of all of the tables in
the database every 20 minutes. It vacuums the other tables, then runs
this query:

VACUUM ANALYZE seg;

I'm not yet certain about the relative timing of the vacuum and the
declaration of the cursors. It may be that the vacuum starts first,
or not. I haven't figured that out yet (some additional debug output
may be necessary).

What happens is that the application grinds to a halt. Looking at
core files (generated with kill -ILL <app>) shows that the vacuum query
is waiting for the result, the stack backtrace looks like this:

pqSocketPoll
pqSocketCheck
pqWaitTimed
pqWait
PQgetResult
PQexecFinish
PQexec("VACUUM ANALYZE seg;")

(When I allowed the other concurrent table modifying queries, many would
also blocked in pqSocketPoll waiting for results). This table is normally
vacuumed in less than 1 minute, but even waiting for 1.5 hours does not
change things. No backend appears to be active at that point.

Gathering information from the pg_locks table produces this:

relname | pid | mode | granted
---------------+--------+--------------------------+---------
seg | 678547 | ShareUpdateExclusiveLock | t (VACUUM)
seg | 678547 | ShareUpdateExclusiveLock | t
seg_lv_id_key | 703519 | AccessShareLock | t (CURSOR lsess #1)
seg | 703519 | AccessShareLock | t
seg_lv_id_key | 703567 | AccessShareLock | t (CURSOR lsess #2)
seg | 703567 | AccessShareLock | t
pg_class | 777441 | AccessShareLock | t
pg_locks | 777441 | AccessShareLock | t

I tried killing one of the backends handling one of the CURSORs to
see what its state looked like, but the core file was overwritten by
one from my app when it threw an exception cleaning up the aftermath. :-(
Nothing shows up in the serverlog output, other than the normal connection
and transaction log messages.

At this point, I'm ready to exclude the enumeration sessions from starting
when the vacuum is active, but I thought I'd try and gather information
just in case it is a problem in PostgreSQL.

Does anyone have any suggestions for tracking this down?

--

Craig Ruff NCAR cruff(at)ucar(dot)edu
(303) 497-1211 P.O. Box 3000
Boulder, CO 80307

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Gaetano Mendola 2004-11-08 21:44:37 Re: Possible bug: pg_hba.conf file
Previous Message SZŰCS Gábor 2004-11-08 14:21:03 Re: Killed backend won't rollback transaction?