Re: postgress 8.1.4 deadlocking??

From: Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com>
To: Casey Duncan <casey(at)pandora(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgress 8.1.4 deadlocking??
Date: 2006-09-27 20:00:13
Message-ID: 1159387214.11244.202.camel@zorro.isa-geek.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ups, missed the list recepient itself.

Thenx Duncan for the analysis.

This happend again, so I'm able to peek at the details you've pointed
out.

On Wed, 2006-09-27 at 09:33 -0700, Casey Duncan wrote:
> Sounds like it was blocked (unsure by what). You can use pg_locks to
> check that.

That view currently contais 240 entries .... source of blocking. Now I
have to dig the ultimate reason why those locks end up there (and are
not being cleand). Any ideas?

Whould those locks go away by themselves .... in time?

> What was consuming all the CPU? I assume from what you wrote
> previously that it is a postgres backend. If so, what was it doing?

----------------ps output (just one backend)------------
postgres 4759 0.0 2.0 19368 5292 ? S 21:02 0:00
postgres: czyn_738773 czyn 127.0.0.1(55431) UPDATE waiting
-------------------------------------------------------

"czyn" is the name of the database, "czyn_738773" is database username.
IP address is localhost, since the database purpose is web serbvice
back-end.

Still, I'm not sure what that UPDATE counld that be.

> You can use ps, the pg_stat_activity view and strace to figure that out.

------------- exerpt from pg_stat_activity-----------------
17660 | czyn | 4759 | 19541 | czyn_738773 | <command string not
enabled> | | 2006-09-27 21:02:05.914728+02 | 127.0.0.1 |
55431
(18 rows)
------------------------------------------

But this is not very meaningfull to me.

> > ...another postgress-process restart, and another VACCUM ANALYSE, this
> > one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And
> > everything
> > seams working.
>
> Was something else using the database previously? Was something else
> different this time than the last?

No. My impression was (naive, I know), that not all things were
correctly cleand-up on the first restart. Now I think, that on the
second go I was just a little bit more lucky - www service was calm for
a little longer. That was anough for VACUUM to complete.

Still, throuble started later.

> How big is the data on disk? Is it possible that you have queries

close to nothing: c.a. 200MiB

> problem (hardware, os, fs). When you saw this happen did you happen
> to check if it was user or system cpu usage? I had a corrupt

It's the user CPU time.

As it currently looks like some unknown UPDATE causes the trouble, how
can I identify what it is?
1. the database is not very sophisticated, apart from what I've
mentioned earlier, it does have just a few stored procedures in pgsql -
nothing fancy. And there is no triggers just specialized views (although
quite a lot of referencial integrity constrains).
2. machine is 'from the old days': 450MHz CPU
3. the system is currently under 'available-for-public' condition, so
there may be 'waves' (peeks) of activity.
4. when the system was tested in-lab (small number of users - no risk /
not-tested for havier load behavior), I haven't noticed this 'blocking'.

Any ideas where should I look now, to pinpoint the problem?
--
-R

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karen Hill 2006-09-27 20:05:40 cyclical redundancy checksum algorithm(s)?
Previous Message Paolo Saudin 2006-09-27 19:29:42 Cumulative aggregate