From: | Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com> |
---|---|
To: | Casey Duncan <casey(at)pandora(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: postgress 8.1.4 deadlocking?? |
Date: | 2006-09-27 20:00:13 |
Message-ID: | 1159387214.11244.202.camel@zorro.isa-geek.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ups, missed the list recepient itself.
Thenx Duncan for the analysis.
This happend again, so I'm able to peek at the details you've pointed
out.
On Wed, 2006-09-27 at 09:33 -0700, Casey Duncan wrote:
> Sounds like it was blocked (unsure by what). You can use pg_locks to
> check that.
That view currently contais 240 entries .... source of blocking. Now I
have to dig the ultimate reason why those locks end up there (and are
not being cleand). Any ideas?
Whould those locks go away by themselves .... in time?
> What was consuming all the CPU? I assume from what you wrote
> previously that it is a postgres backend. If so, what was it doing?
----------------ps output (just one backend)------------
postgres 4759 0.0 2.0 19368 5292 ? S 21:02 0:00
postgres: czyn_738773 czyn 127.0.0.1(55431) UPDATE waiting
-------------------------------------------------------
"czyn" is the name of the database, "czyn_738773" is database username.
IP address is localhost, since the database purpose is web serbvice
back-end.
Still, I'm not sure what that UPDATE counld that be.
> You can use ps, the pg_stat_activity view and strace to figure that out.
------------- exerpt from pg_stat_activity-----------------
17660 | czyn | 4759 | 19541 | czyn_738773 | <command string not
enabled> | | 2006-09-27 21:02:05.914728+02 | 127.0.0.1 |
55431
(18 rows)
------------------------------------------
But this is not very meaningfull to me.
> > ...another postgress-process restart, and another VACCUM ANALYSE, this
> > one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And
> > everything
> > seams working.
>
> Was something else using the database previously? Was something else
> different this time than the last?
No. My impression was (naive, I know), that not all things were
correctly cleand-up on the first restart. Now I think, that on the
second go I was just a little bit more lucky - www service was calm for
a little longer. That was anough for VACUUM to complete.
Still, throuble started later.
> How big is the data on disk? Is it possible that you have queries
close to nothing: c.a. 200MiB
> problem (hardware, os, fs). When you saw this happen did you happen
> to check if it was user or system cpu usage? I had a corrupt
It's the user CPU time.
As it currently looks like some unknown UPDATE causes the trouble, how
can I identify what it is?
1. the database is not very sophisticated, apart from what I've
mentioned earlier, it does have just a few stored procedures in pgsql -
nothing fancy. And there is no triggers just specialized views (although
quite a lot of referencial integrity constrains).
2. machine is 'from the old days': 450MHz CPU
3. the system is currently under 'available-for-public' condition, so
there may be 'waves' (peeks) of activity.
4. when the system was tested in-lab (small number of users - no risk /
not-tested for havier load behavior), I haven't noticed this 'blocking'.
Any ideas where should I look now, to pinpoint the problem?
--
-R
From | Date | Subject | |
---|---|---|---|
Next Message | Karen Hill | 2006-09-27 20:05:40 | cyclical redundancy checksum algorithm(s)? |
Previous Message | Paolo Saudin | 2006-09-27 19:29:42 | Cumulative aggregate |