Re: Invulnerable VACUUM process thrashing everything

From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Russ Garrett <russ(at)garrett(dot)co(dot)uk>
Cc: pgperf <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Invulnerable VACUUM process thrashing everything
Date: 2005-12-30 02:16:02
Message-ID: 1135908962.2908.3.camel@toonses.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2005-12-29 at 22:53 +0000, Russ Garrett wrote:
> In my experience a kill -9 has never resulted in any data loss in this
> situation (it will cause postgres to detect that the process died, shut
> down, then recover), and most of the time it only causes a 5-10sec
> outage. I'd definitely hesitate to recommend it in a production context
> though, especially since I think there are some known race-condition
> bugs in 7.4.
>
> VACUUM *will* respond to a SIGTERM, but it doesn't check very often -
> I've often had to wait hours for it to determine that it's been killed,
> and my tables aren't anywhere near 1TB. Maybe this is a place where
> things could be improved...

FWIW, I murdered this process with SIGKILL, and the recovery was very
short.

> Incidentally, I have to kill -9 some of our MySQL instances quite
> regularly because they do odd things. Not something you want to be
> doing, especially when MySQL takes 30mins to recover.

Agreed. After mysql shutdown with MyISAM, all tables must be checked
and usually many need to be repaired. This takes a reallllllly long
time.

-jwb

> Russ Garrett
> Last.fm Ltd.
> russ(at)last(dot)fm
>
> Ron wrote:
>
> > Ick. Can you get users and foreign connections off that machine, lock
> > them out for some period, and renice the VACUUM?
> >
> > Shedding load and keeping it off while VACUUM runs high priority might
> > allow it to finish in a reasonable amount of time.
> > Or
> > Shedding load and dropping the VACUUM priority might allow a kill
> > signal to get through.
> >
> > Hope this helps,
> > Ron
> >
> >
> > At 05:09 PM 12/29/2005, Jeffrey W. Baker wrote:
> >
> >> A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally
> >> decided to VACUUM a table which has not been updated in over a year and
> >> is more than one terabyte on the disk. Because of the very high
> >> transaction load on this database, this VACUUM has been ruining
> >> performance for almost a month. Unfortunately is seems invulnerable to
> >> killing by signals:
> >>
> >> # ps ax | grep VACUUM
> >> 15308 ? D 588:00 postgres: postgres skunk [local] VACUUM
> >> # kill -HUP 15308
> >> # ps ax | grep VACUUM
> >> 15308 ? D 588:00 postgres: postgres skunk [local] VACUUM
> >> # kill -INT 15308
> >> # ps ax | grep VACUUM
> >> 15308 ? D 588:00 postgres: postgres skunk [local] VACUUM
> >> # kill -PIPE 15308
> >> # ps ax | grep VACUUM
> >> 15308 ? D 588:00 postgres: postgres skunk [local] VACUUM
> >>
> >> o/~ But the cat came back, the very next day ...
> >>
> >> I assume that if I kill this with SIGKILL, that will bring down every
> >> other postgres process, so that should be avoided. But surely there is
> >> a way to interrupt this. If I had some reason to shut down the
> >> instance, I'd be screwed, it seems.
> >
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-12-30 03:03:01 Re: Invulnerable VACUUM process thrashing everything
Previous Message Russ Garrett 2005-12-29 22:53:11 Re: Invulnerable VACUUM process thrashing everything