Re: Help: 8.0.3 Vacuum of an empty table never completes ...

From: James Robinson <jlrobins(at)socialserve(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Help: 8.0.3 Vacuum of an empty table never completes ...
Date: 2005-11-28 17:47:20
Message-ID: D7140CD2-5D64-4958-819D-F1B6533B7C5C@socialserve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Nov 28, 2005, at 12:00 PM, Tom Lane wrote:

> Your next move is to look at the state of sshd
> and whatever is running at the client end of the ssh tunnel.

backtrace of the sshd doesn't look good:

(gdb) bt
#0 0xffffe410 in ?? ()
#1 0xbfffdb48 in ?? ()
#2 0x080a1e28 in ?? ()
#3 0x080a1e78 in ?? ()
#4 0xb7d379fd in ___newselect_nocancel () from /lib/tls/libc.so.6
#5 0x08054d64 in ?? ()
#6 0x0000000a in ?? ()
#7 0x080a1e78 in ?? ()
#8 0x080a1e28 in ?? ()
#9 0x00000000 in ?? ()
#10 0xbfffdb30 in ?? ()
#11 0x00000000 in ?? ()
#12 0xbfffdb48 in ?? ()
#13 0x0806c796 in ?? ()
#14 0x080a9d3c in ?? ()
#15 0x00000001 in ?? ()
#16 0xbfffdb64 in ?? ()
#17 0x08054c3d in ?? ()
#18 0x00000019 in ?? ()
#19 0x000acda0 in ?? ()
#20 0x080a9d3c in ?? ()
#21 0x00000000 in ?? ()
#22 0xbfffdb6c in ?? ()
#23 0x00000000 in ?? ()
#24 0xbfffdb78 in ?? ()
---Type <return> to continue, or q <return> to quit---
#25 0x08055632 in ?? ()
#26 0xbfffdb6c in ?? ()
#27 0x00000000 in ?? ()
#28 0x080a1e78 in ?? ()
#29 0x08098ee8 in ?? ()
#30 0x080a1e78 in ?? ()
#31 0x080a1e28 in ?? ()
#32 0x00000009 in ?? ()
#33 0x00000004 in ?? ()
#34 0x00000001 in ?? ()
#35 0x00000001 in ?? ()
#36 0xbfffdbb8 in ?? ()
#37 0x0805b816 in ?? ()
#38 0x08098ee8 in ?? ()
#39 0x080a2e10 in ?? ()
#40 0x00000007 in ?? ()
#41 0x08098ee8 in ?? ()
#42 0x08080fd2 in _IO_stdin_used ()
#43 0x08098ee8 in ?? ()
#44 0xbfffdbb8 in ?? ()
#45 0x080574a3 in ?? ()
#46 0x00000000 in ?? ()
#47 0x08098ee8 in ?? ()
#48 0x08098ee8 in ?? ()
#49 0x08098f30 in ?? ()
---Type <return> to continue, or q <return> to quit---
#50 0x08080fd2 in _IO_stdin_used ()
#51 0x08098ee8 in ?? ()
#52 0xbfffeb98 in ?? ()
#53 0x0804fc90 in ?? ()
#54 0x08098ee8 in ?? ()
#55 0x08098f74 in ?? ()
#56 0x08098f30 in ?? ()
#57 0xbfffe110 in ?? ()
#58 0xbfffe110 in ?? ()
#59 0x0808014a in _IO_stdin_used ()
#60 0xb7ffad95 in malloc () from /lib/ld-linux.so.2
Previous frame inner to this frame (corrupt stack?)

The client-side ssh is worse -- 507 frames before it reports
'(corrupt stack?)'.

At this moment in time, should we kill off the offending processes
from Nov 25 -- starting from client-most side all the way to the
vacuumdb process on the production server. The other vacuums would
probably then complete happily, and we'd be cool again, eh?

I suppose we're darn lucky the process got ultimately gummed up on a
table that sees no traffic at all to it, eh? The lock that vacuum has
taken out on it would prevent at least some things happening to the
table in question -- possibly even new inserts or updates?

Could this potentially be alleviated in the future by a little code
reordering in vacuumdb or postmaster by completing working on the
current table completely before emitting output, either postmaster ->
vacuumdb client, or possibly the vacuumdb client -> whatever stdout
is directed to so as to get gummed up in a state when no locks are
being held? Or would that uglify the code too much and/or people
would find that additional buffering a damnable offense?

----
James Robinson
Socialserve.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-28 18:39:52 Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Previous Message Mario Weilguni 2005-11-28 17:41:09 Re: Getting different number of results when using hashjoin on/off