Re: Help: 8.0.3 Vacuum of an empty table never completes ...

From: James Robinson <jlrobins(at)socialserve(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Help: 8.0.3 Vacuum of an empty table never completes ...
Date: 2005-11-28 22:03:56
Message-ID: 8C696945-49D6-4872-8A24-55CEA8BA8C1D@socialserve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Nov 28, 2005, at 4:13 PM, Tom Lane wrote:

> Yeah, could be. Anyway it doesn't seem like we can learn much more
> today. You might as well just zing the vacuumdb process and let
> things get back to normal. If it happens again, we'd have reason
> to dig deeper.

Final report [ and apologies to hackers list in general -- sorry for
the noise today ].

Killed the vacuumdb frontend. Then went off killing processes spawned
by cron on Nov25th related to the cronjob. All of the related
backends exited peacefully, and all is well. Manual vacuum verbose
analyze completes successfully.

One possibly curious thing -- one final process remains on the backup
box dated Nov25:

root 19912 3 0 Nov25 ? 00:00:12 [pdflush]

Coincidence? This is some sort of kernel thread, right? Flushes dirty
pages to disk? There are two on this machine:

root 9211 3 0 Nov22 ? 00:02:56 [pdflush]
root 19912 3 0 Nov25 ? 00:00:12 [pdflush]

The Nov25'ths pdflush's pid is suspiciously close to the pids which
would be in use around the beginning of the cron'd process. [ checks /
var/log/messages ... ] -- yep -- real close -- last known cross-
referencable pid is:

Nov 25 04:59:01 db02 /usr/sbin/cron[20590]: (root) CMD ( rm -f /var/
spool/cron/lastrun/cron.hourly)

and the vacuumdb sshd connection on the production db box is logged
at 05:02:22 AM, so that pdflush would have been started real close to
the time which the remote backup + vacuum script would have been
running.

Any Linux 2.6 gurus lurking? Under what circumstances do pdflush'es
get spawned? The filesystem upon which the outputs were going is a
software raid partition (raid-0? raid-1? Always confuse the two) --
the interleaved one anyway, not mirrored -- formatted reiser3.

Neither pdflush instance on this machine was started anywhere near
the boot time of the machine -- both much later. Whereas on the
production box the two pdflush instances are both dated from machine
boot time. Does this perchance indicate unhappiness afoot perhaps
hardware-wise?

----
James Robinson
Socialserve.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-11-28 22:21:51 Re: comment doesn't accept expressions returning text
Previous Message Tom Lane 2005-11-28 21:13:35 Re: Help: 8.0.3 Vacuum of an empty table never completes ...