BUG #5321: Parallel restore temporarily deadlocked by autovacuum analyze

From: "Jerry Gamache" <jerry(dot)gamache(at)idilia(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5321: Parallel restore temporarily deadlocked by autovacuum analyze
Date: 2010-02-09 21:55:32
Message-ID: 201002092155.o19LtWq9001298@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5321
Logged by: Jerry Gamache
Email address: jerry(dot)gamache(at)idilia(dot)com
PostgreSQL version: 8.4.2
Operating system: Linux
Description: Parallel restore temporarily deadlocked by autovacuum
analyze
Details:

While restoring a database in 8.4.2, pg_restore stopped for more than 30
minutes doing nothing:

pg_restore -d database1 -j 8 -v database1.pg_dump

The parallel restore showed very impressive performance as the verbose
output went by, but the restore suddenly stopped. The output of "ps" after
15 minutes of 0% CPU activity showed 4 process in "Ss" state:

postgres: xxx database1 [local] CREATE INDEX waiting
postgres: xxx database1 [local] CREATE INDEX waiting
postgres: autovacuum worker process database1
postgres: autovacuum worker process database1

The output of pg_stat_activity for database1 showed:

database1=# select procpid,datname,current_query from pg_stat_activity where
datname='database1' ORDER BY procpid;
procpid | datname |
current_query
---------+------------------+-----------------------------------------------
----------------------------------------------------------------
6146 | database1 | autovacuum: ANALYZE public.table_x
6270 | database1 |
: CREATE UNIQUE INDEX index1 ON table_x USING
btree (col_a, col_b);
6271 | database1 |
: CREATE UNIQUE INDEX index2 ON table_x USING
btree (col_b, col_c, col_d);
6307 | database1 | autovacuum: ANALYZE public.table_y
6597 | database1 | <IDLE>
6626 | database1 | select procpid,datname,current_query from
pg_stat_activity where datname='database1' ORDER BY procpid;
(6 rows)

The output of pg_locks for the 4 stuck PIDs showed:

database1=# select * from pg_locks where pid IN (SELECT procpid FROM
pg_stat_activity WHERE datname='database1') ORDER BY pid;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid |
mode | granted
------------+----------+----------+------+-------+------------+-------------
--+---------+-------+----------+--------------------+------+----------------
----------+---------
relation | 16993 | 17597 | | | |
| | | | 11/57 | 6146 | AccessShareLock
| t
relation | 16993 | 17616 | | | |
| | | | 11/57 | 6146 | AccessShareLock
| t
relation | 16993 | 17663 | | | |
| | | | 11/57 | 6146 | AccessShareLock
| t
virtualxid | | | | | 11/57 |
| | | | 11/57 | 6146 | ExclusiveLock
| t
relation | 16993 | 17165 | | | |
| | | | 11/57 | 6146 |
ShareUpdateExclusiveLock | t
virtualxid | | | | | 13/100 |
| | | | 13/100 | 6270 | ExclusiveLock
| t
relation | 16993 | 17165 | | | |
| | | | 13/100 | 6270 | ShareLock
| f
relation | 16993 | 17165 | | | |
| | | | 14/50 | 6271 | ShareLock
| f
virtualxid | | | | | 14/50 |
| | | | 14/50 | 6271 | ExclusiveLock
| t
relation | 16993 | 17188 | | | |
| | | | 3/844 | 6307 |
ShareUpdateExclusiveLock | t
virtualxid | | | | | 3/844 |
| | | | 3/844 | 6307 | ExclusiveLock
| t
relation | 16993 | 17660 | | | |
| | | | 3/844 | 6307 | AccessShareLock
| t
relation | 16993 | 17590 | | | |
| | | | 3/844 | 6307 | AccessShareLock
| t
relation | 0 | 2677 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
relation | 0 | 2676 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
relation | 16993 | 11042 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
relation | 0 | 2671 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
relation | 0 | 2672 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
virtualxid | | | | | 7/187 |
| | | | 7/187 | 6626 | ExclusiveLock
| t
relation | 16993 | 10969 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
relation | 0 | 1262 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
relation | 0 | 1260 | | | |
| | | | 7/187 | 6626 | AccessShareLock
| t
(22 rows)

The restore resumed while I was writing this report, and I saw these new
entries in the logs:

ERROR: canceling autovacuum task
CONTEXT: automatic analyze of table "database1.public.table_y"
ERROR: canceling autovacuum task
CONTEXT: automatic analyze of table "database1.public.table_y"

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-02-09 21:59:32 Re: unable to fail over to warm standby server
Previous Message Heikki Linnakangas 2010-02-09 19:47:23 Re: unable to fail over to warm standby server