Re: Strange deadlock error last night

From: "Scott Whitney" <swhitney(at)journyx(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Strange deadlock error last night
Date: 2009-01-13 22:35:12
Message-ID: 20090113223407.BDF247E46BA@mail.int.journyx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks for all the information, guys. I think Tom was right. Our application
was doing a couple of full vacs at the same time. It's weird that we didn't
run into this in the past.

You're all absolutely right about the upgrading, but in our environment,
it's not 2-3 minutes. It's 2-3 weeks. I've got to fully vet the app on the
platform internally with full test plans, etc, even for the most minor
upgrades; corp policy.

Right now, my effort is in going to the latest stable branch. Moving
forward, I will use these notes to get the company to revisit the minor
upgrade policy, though.

After all, when I _do_ get hit by one of those bugs, I _will_ be asked why
we weren't upgraded. *sigh*

-----Original Message-----
From: Scott Marlowe [mailto:scott(dot)marlowe(at)gmail(dot)com]
Sent: Tuesday, January 13, 2009 4:16 PM
To: Scott Whitney
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] Strange deadlock error last night

On Tue, Jan 13, 2009 at 10:37 AM, Scott Whitney <swhitney(at)journyx(dot)com>
wrote:

> It ended up locking up about 250 customer databases until I restarted the
> postmaster. This is version 8.1.4. Upgrading right now (even to a minor
rev)
> is not really an option. This box has been up and running for 306 days.
This
> postgres level has been installed for..err...well...at least Aug 9, 2006,
> based on some dates in the directories.

You need to ask yourself how much downtime you can afford. The 2 or 3
minutes every few months to go from 8.1.x to 8.1.x+1, or the half a
day of downtime when some horrendous bug takes down the whole site
because you didn't update it. Seriously, that unfozen template0 bug
that Alvarro mentioned is one of those kinds of bugs.

Nothing like your db going down in the middle of the day with an error
message that it's going down to prevent txid wraparound induced loss,
please run vacuum on all your databases in single user mode.

If you can't find set aside a minute or two at 0200 hrs, then don't be
surprised when you get one of those failures.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2009-01-13 22:50:39 Re: Strange deadlock error last night
Previous Message Scott Marlowe 2009-01-13 22:16:18 Re: Strange deadlock error last night