WARNING: pgstat wait timeout

From: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: WARNING: pgstat wait timeout
Date: 2010-01-20 23:20:08
Message-ID: alpine.LRH.2.00.1001210139500.15783@lnfm1.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I've recently hit the message "WARNING: pgstat wait timeout" with PG
8.4.2.
I saw some reports about that message in the -bugs mailing list
http://archives.postgresql.org/pgsql-bugs/2009-12/msg00175.php
http://archives.postgresql.org/pgsql-bugs/2009-07/msg00081.php
where the backtrace from the statisctic collector was requested.
Although I don't have any other bad sympthoms in the system, I still
obtained a backtrace from the statistics collector process. Since I'm not
100% sure that the message is really a bug, feel free to ignore. But
if needed I have PG still running, so I can check something else if
needed.

Here is the (rather innocent IMHO) backtrace of the statistic collector
process:
(gdb) bt
#0 0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
#1 0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>,
# argv=<value optimized out>)
at pgstat.c:2718
#2 0x00000000005c0131 in pgstat_start () at pgstat.c:631
#3 0x00000000005c474d in reaper (postgres_signal_arg=<value optimized
out>)
#at postmaster.c:2322
#4 <signal handler called>
#5 0x00007f31ddfc6c83 in select () from /lib/libc.so.6
#6 0x00000000005c20fc in ServerLoop () at postmaster.c:1347
#7 0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at
#postmaster.c:1040
#8 0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /opt/pgsql/bin/postgres, process 24677
-------------

Bt full:

(gdb) bt full
#0 0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
No symbol table info available.
#1 0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>, argv=<value optimized out>)
at pgstat.c:2718
len = 64
msg = {msg_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}}, msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,
m_size = 64}, inquiry_time = 0}, msg_tabstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},
m_databaseid = 0, m_nentries = 0, m_xact_commit = 0, m_xact_rollback = 0, m_entry = {{t_id = 0, t_counts = {
t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 138, t_tuples_inserted = 138,
t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 2672, t_counts = {t_numscans = 1,
t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,
t_tuples_returned = 553, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 24, t_blocks_hit = 24}}, {t_id = 2615, t_counts = {t_numscans = 0,
t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts = {t_numscans = 1, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2,
t_blocks_hit = 2}}, {t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched = 0, t_tuples_inserted = 500000, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,
t_blocks_hit = 15381}}, {t_id = 1172684, t_counts = {t_numscans = 0, t_tuples_returned = 0,
msg = {msg_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}}, msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,
m_size = 64}, inquiry_time = 0}, msg_tabstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},
m_databaseid = 0, m_nentries = 0, m_xact_commit = 0, m_xact_rollback = 0, m_entry = {{t_id = 0, t_counts = {
t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 138, t_tuples_inserted = 138,
t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 2672, t_counts = {t_numscans = 1,
t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,
t_tuples_returned = 553, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 24, t_blocks_hit = 24}}, {t_id = 2615, t_counts = {t_numscans = 0,
t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts = {t_numscans = 1, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2,
t_blocks_hit = 2}}, {t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched = 0, t_tuples_inserted = 500000, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,
t_blocks_hit = 15381}}, {t_id = 1172684, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 1018,
t_blocks_hit = 1017}}, {t_id = 2678, t_counts = {t_numscans = 5, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 5,
t_blocks_hit = 5}}, {t_id = 2619, t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0,
t_tuples_inserted = 0, t_tuples_updated = 24, t_tuples_deleted = 0, t_tuples_hot_updated = 24,
t_new_live_tuples = 0, t_new_dead_tuples = 9, t_blocks_fetched = 72, t_blocks_hit = 72}}, {t_id = 2696,
t_counts = {t_numscans = 48, t_tuples_returned = 48, t_tuples_fetched = 48, t_tuples_inserted = 0,
t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples = 0, t_blocks_fetched = 96, t_blocks_hit = 96}}}}, msg_tabpurge = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_nentries = 0, m_tableid = {0, 0, 0, 0, 0, 0,
0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0,
2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615, 0 <repeats 19 times>, 1, 0, 1, 0, 2685, 0,
1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 0, 0, 0,
0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684, 0 <repeats 19 times>, 1018, 0, 1017, 0, 2678, 0, 5, 0, 1, 0,
1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 24, 0, 0, 0, 9, 0, 72, 0,
---Type <return> to continue, or q <return> to quit---
72, 0, 2696, 32767, 48, 0, 48, 0, 48, 0 <repeats 13 times>...}}, msg_dropdb = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0}, msg_resetcounter = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0}, msg_autovacuum = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_start_time = 0}, msg_vacuum = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_analyze = 0 '\0',
m_autovacuum = 0 '\0', m_scanned_all = 0 '\0', m_vacuumtime = 0, m_tuples = 0}, msg_analyze = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_autovacuum = 0 '\0',
m_analyzetime = 0, m_live_tuples = 0, m_dead_tuples = 0}, msg_bgwriter = {m_hdr = {
m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_timed_checkpoints = 0, m_requested_checkpoints = 0,
m_buf_written_checkpoints = 0, m_buf_written_clean = 0, m_maxwritten_clean = 0, m_buf_written_backend = 138,
m_buf_alloc = 138}, msg_funcstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0,
m_nentries = 0, m_entry = {{f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 138,
f_numcalls = 138, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {
f_id = 2, f_numcalls = 140733193390704, f_time = 1, f_time_self = 1}, {f_id = 1, f_numcalls = 0, f_time = 0,
f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2, f_numcalls = 1259,
f_time = 4, f_time_self = 553}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0,
f_numcalls = 0, f_time = 0, f_time_self = 24}, {f_id = 24, f_numcalls = 2615, f_time = 0, f_time_self = 0}, {
f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0,
f_time_self = 1}, {f_id = 1, f_numcalls = 2685, f_time = 1, f_time_self = 1}, {f_id = 1, f_numcalls = 0,
f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2,
f_numcalls = 139848431297871, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 500000, f_time = 0,
f_time_self = 0}, {f_id = 0, f_numcalls = 500000, f_time = 0, f_time_self = 23077}, {f_id = 15381,
f_numcalls = 1172684, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0},
{f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 1018}, {f_id = 1017, f_numcalls = 2678, f_time = 5,
f_time_self = 1}, {f_id = 1, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0,
f_time = 0, f_time_self = 5}, {f_id = 5, f_numcalls = 12884904507, f_time = 0, f_time_self = 0}, {f_id = 0,
f_numcalls = 0, f_time = 24, f_time_self = 0}, {f_id = 24, f_numcalls = 0, f_time = 9, f_time_self = 72}, {
f_id = 72, f_numcalls = 140733193390728, f_time = 48, f_time_self = 48}, {f_id = 48, f_numcalls = 0,
f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 96}}}, msg_funcpurge = {
m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_nentries = 0, m_functionid = {0, 0,
0, 0, 0, 0, 0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767, 1, 0, 1, 0, 1,
0 <repeats 13 times>, 2, 0, 2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615,
0 <repeats 19 times>, 1, 0, 1, 0, 2685, 0, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0,
0, 0, 0, 0, 0, 500000, 0, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684,
0 <repeats 19 times>, 1018, 0, 1017, 0, 2678, 0, 5, 0, 1, 0, 1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0,
0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0, 24, 0, 0, 0, 9, 0, 72, 0, 72, 0, 2696, 32767, 48, 0, 48, 0, 48,
0 <repeats 13 times>...}}}
input_fd = {fd = 8, events = 9, revents = 0}
__func__ = "PgstatCollectorMain"
#2 0x00000000005c0131 in pgstat_start () at pgstat.c:631
curtime = <value optimized out>
pgStatPid = 0
__func__ = "pgstat_start"
#3 0x00000000005c474d in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2322
---Type <return> to continue, or q <return> to quit---
save_errno = 17
pid = 24673
exitstatus = 0
status = 0
__func__ = "reaper"
#4 <signal handler called>
#4 <signal handler called>
No symbol table info available.
#5 0x00007f31ddfc6c83 in select () from /lib/libc.so.6
No symbol table info available.
#6 0x00000000005c20fc in ServerLoop () at postmaster.c:1347
timeout = {tv_sec = 59, tv_usec = 916269}
rmask = {fds_bits = {120, 0 <repeats 15 times>}}
selres = <value optimized out>
readmask = {fds_bits = {120, 0 <repeats 15 times>}}
nSockets = 7
now = 64
last_touch_time = 1264024914
__func__ = "ServerLoop"
#7 0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at postmaster.c:1040
opt = <value optimized out>
status = <value optimized out>
userDoption = <value optimized out>
__func__ = "PostmasterMain"
#8 0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188

----------

The message occurs with PG 8.4.2 on a powerful 16-CPU, 12Gb RAM machine
with RAID50 during heavy COPY/INSERTs/UPDATEs from one Java client (some
of the INSERTS actually take up to 100 sec). The OS is Linux Debian 5.0 with
2.6.30 kernel.

I see these WARNINGs appearing for the second time and as before I
saw them staring after I did a manual vacuum of one of the tables in which
the data is inserted. But I have no idea whether that's really a cause or
just a coincidence.

Regards,
S.

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge, UK
Tel: +44-1223-337-551
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math(at)sai(dot)msu(dot)ru

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-01-20 23:22:01 Re: Streaming Replication and archiving
Previous Message Josh Berkus 2010-01-20 23:04:20 HS/SR and smart shutdown