BUG #4730: Vacuum full verbose analyze "deadlock"

From: "Wayne Conrad" <wconrad(at)yagni(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4730: Vacuum full verbose analyze "deadlock"
Date: 2009-03-24 23:56:20
Message-ID: 200903242356.n2ONuKaG076228@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4730
Logged by: Wayne Conrad
Email address: wconrad(at)yagni(dot)com
PostgreSQL version: 8.3.6
Operating system: Debian 4.3.1-1
Description: Vacuum full verbose analyze "deadlock"
Details:

"VACUUM FULL ANALYZE VERBOSE" on a "deadlocks" with other SQL statements.
No deadlock detection messages appear in the Postgres log, which is why I
have put "deadlock" in quotes: It's a deadlock in the computer science
meaning of the word, but perhaps not in the Postgres internal meaning of the
word. Specifically, the VACUUM appears to be waiting on SQL statements
which are in turn waiting on the VACUUM. That condition continues
indefinitely until the VACUUM is manually terminated (a control-C in psql),
at which time all locks appear to be released and processing continues
normally.

A query of pg_locks shows the condition. We've written a program that
queries pg_locks and shows who is waiting on whom. here is a snippet of
its output for one of these incidents. If a raw query of that table (or
some other) would be better, please let me know what that query should be.

----------

Lock:

locktype=relation relation=jobs
granted=false transaction=4/900 pid=10247 mode=AccessExclusiveLock
query = vacuum full verbose analyze;

Dependants of that lock:

locktype=relation relation=jobs
granted=false transaction=22/654227 pid=3321 mode=AccessShareLock
query = select id, to_char(time_stamp, 'YYYY-MM-DD HH24:MI:SS') as
time_stamp, injectd_log.log_pid as log_pid, injectd_host, injectd_pid, line
from injectd_log where log_pid in (select jobs.pid from jobs inner join
uploads using (upload_id) where account_id = E'1000-000' union (select pid
from uploads where account_id = E'1000-000')) and id > 27398675 order by id

locktype=relation relation=jobs
granted=false transaction=15/1241550 pid=32466 mode=AccessShareLock
query = select w.id as id, w.created_at as created_at, w.status as status,
w.timer as timer, w.error as error, w.priority as priority, w.work_type_id
as work_type_id, w.upload_id as upload_id, w.job_id as job_id,
w.batch_command_id as batch_command_id, w.mail_id as mail_id,
w.print_queue_id as print_queue_id from work_queue_available inner join
work_queue_data w on (work_queue_available.id = w.id) left outer join
batch_commands on (batch_command_id = batch_commands.id) left outer join
mail on (mail_id = mail.id) left outer join print_queue on (print_queue_id =
print_queue.id) left outer join uploads on (w.upload_id = uploads.upload_id)
left outer join jobs on (w.job_id = jobs.job_id) order by w.priority desc,
case when w.work_type_id = (select id from work_types where name = E'cron')
then 1 when w.work_type_id = (select id from work_types where name =
E'email') then 2 else 3 end, coalesce(batch_commands.created_at,
mail.created_at, print_queue.created_at, uploads.log_time, jobs.start_time),
id limit 28

locktype=relation relation=jobs
granted=true transaction=7/1931437 pid=27584 mode=AccessShareLock
query = <IDLE> in transaction

locktype=relation relation=jobs
granted=true transaction=8/2113080 pid=31961 mode=AccessShareLock
query = <IDLE> in transaction

----------

Lock:

locktype=relation relation=jobs
granted=false transaction=22/654227 pid=3321 mode=AccessShareLock
query = select id, to_char(time_stamp, 'YYYY-MM-DD HH24:MI:SS') as
time_stamp, injectd_log.log_pid as log_pid, injectd_host, injectd_pid, line
from injectd_log where log_pid in (select jobs.pid from jobs inner join
uploads using (upload_id) where account_id = E'1000-000' union (select pid
from uploads where account_id = E'1000-000')) and id > 27398675 order by id

Dependants of that lock:

locktype=relation relation=jobs
granted=false transaction=4/900 pid=10247 mode=AccessExclusiveLock
query = vacuum full verbose analyze;

locktype=relation relation=jobs
granted=false transaction=15/1241550 pid=32466 mode=AccessShareLock
query = select w.id as id, w.created_at as created_at, w.status as status,
w.timer as timer, w.error as error, w.priority as priority, w.work_type_id
as work_type_id, w.upload_id as upload_id, w.job_id as job_id,
w.batch_command_id as batch_command_id, w.mail_id as mail_id,
w.print_queue_id as print_queue_id from work_queue_available inner join
work_queue_data w on (work_queue_available.id = w.id) left outer join
batch_commands on (batch_command_id = batch_commands.id) left outer join
mail on (mail_id = mail.id) left outer join print_queue on (print_queue_id =
print_queue.id) left outer join uploads on (w.upload_id = uploads.upload_id)
left outer join jobs on (w.job_id = jobs.job_id) order by w.priority desc,
case when w.work_type_id = (select id from work_types where name = E'cron')
then 1 when w.work_type_id = (select id from work_types where name =
E'email') then 2 else 3 end, coalesce(batch_commands.created_at,
mail.created_at, print_queue.created_at, uploads.log_time, jobs.start_time),
id limit 28

locktype=relation relation=jobs
granted=true transaction=7/1931437 pid=27584 mode=AccessShareLock
query = <IDLE> in transaction

locktype=relation relation=jobs
granted=true transaction=8/2113080 pid=31961 mode=AccessShareLock
query = <IDLE> in transaction

----------

Lock:

locktype=relation relation=jobs
granted=false transaction=15/1241550 pid=32466 mode=AccessShareLock
query = select w.id as id, w.created_at as created_at, w.status as status,
w.timer as timer, w.error as error, w.priority as priority, w.work_type_id
as work_type_id, w.upload_id as upload_id, w.job_id as job_id,
w.batch_command_id as batch_command_id, w.mail_id as mail_id,
w.print_queue_id as print_queue_id from work_queue_available inner join
work_queue_data w on (work_queue_available.id = w.id) left outer join
batch_commands on (batch_command_id = batch_commands.id) left outer join
mail on (mail_id = mail.id) left outer join print_queue on (print_queue_id =
print_queue.id) left outer join uploads on (w.upload_id = uploads.upload_id)
left outer join jobs on (w.job_id = jobs.job_id) order by w.priority desc,
case when w.work_type_id = (select id from work_types where name = E'cron')
then 1 when w.work_type_id = (select id from work_types where name =
E'email') then 2 else 3 end, coalesce(batch_commands.created_at,
mail.created_at, print_queue.created_at, uploads.log_time, jobs.start_time),
id limit 28

Dependants of that lock:

locktype=relation relation=jobs
granted=false transaction=4/900 pid=10247 mode=AccessExclusiveLock
query = vacuum full verbose analyze;

locktype=relation relation=jobs
granted=false transaction=22/654227 pid=3321 mode=AccessShareLock
query = select id, to_char(time_stamp, 'YYYY-MM-DD HH24:MI:SS') as
time_stamp, injectd_log.log_pid as log_pid, injectd_host, injectd_pid, line
from injectd_log where log_pid in (select jobs.pid from jobs inner join
uploads using (upload_id) where account_id = E'1000-000' union (select pid
from uploads where account_id = E'1000-000')) and id > 27398675 order by id

locktype=relation relation=jobs
granted=true transaction=7/1931437 pid=27584 mode=AccessShareLock
query = <IDLE> in transaction

locktype=relation relation=jobs
granted=true transaction=8/2113080 pid=31961 mode=AccessShareLock
query = <IDLE> in transaction

To summarize the above output: there are three statements, each waiting on
the other two:

transaction 4/900: VACUUM FULL VERBOSE ANALYZE
transaction 22/654277: SELECT ... FROM INJECTD_LOG ...
transaction 15/1241550: SELECT ... FROM WORK_QUEUE_AVAILABLE ...

A "VACUUM VERBOSE ANALYZE" (without the "FULL") does not manifest this
anomaly. Furthermore, after doing a partial "VACUUM VERBOSE ANALYZE" (that
is, aborting it after it has gotten finished with some tables, but before it
has done all of them), a "VACUUM FULL VERBOSE ANALYZE" gets farther (though
more tables) before the lockup occurs. Specifically, the "VACUUM FULL
VERBOSE ANALYZE" does not get locked up until it hits a table that the
"VACUUM VERBOSE ANALYZE" did not get to.

Sadly, I have only made this happen in our production environment (very
active, very large set of data). No tidy test case wrapped up in a tarball
with ribbons and bows. I am submitting this bug report despite the clear
instructions to only submit bug reports against the latest minor version,
since I'm currently at the end of my rope (until I manage to reproduce this
in a non-production environment). I am hoping I can be forgiven for being
one minor version behind. If not, let me know--I've got a nice corner right
here to go stand in.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ceriel Jacobs 2009-03-25 00:37:54 BUG #4731: compile with -fast fails on ld: duplicate symbol _PGP_S2K_TYPE in pgp.o and pgp-mpi-openssl.o
Previous Message Bruce Toll 2009-03-24 23:34:42 Re: BUG #4728: segfault with window function partition involving subquery