Re: Sudden slowdown of Pg server

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Jerry Sievers <jerry(at)jerrysievers(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sudden slowdown of Pg server
Date: 2006-01-20 22:13:58
Message-ID: 43D160A6.4020207@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


lockstat is available in Solaris 9. That can help you to determine if
there are any kernel level locks that are occuring during that time.
Solaris 10 also has plockstat which can be used to identify userland
locks happening in your process.

Since you have Solaris 9, try the following:

You can run (as root)
lockstat sleep 5
and note the output which can be long.

I guess "prstat -am" output, "iostat -xczn 3", "vmstat 3" outputs will
help also.

prstat -am has a column called "LAT", if the value is in double digits,
then you have a locking issue which will probably result in higher "SLP"
value for the process. (Interpretation is data and workload specific
which this email is too small to decode)

Once you have identified a particular process (if any) to be the source
of the problem, get its id and you can look at the outputs of following
command which (quite intrusive)
truss -c -p $pid 2> truss-syscount.txt

(Ctrl-C after a while to stop collecting)

truss -a -e -u":::" -p $pid 2> trussout.txt

(Ctrl-C after a while to stop collecting)

Regards,
Jignesh

Jerry Sievers wrote:

>Hello;
>
>I am going through a post mortem analysis of an infrequent but
>recurring problem on a Pg 8.0.3 installation. Application code
>connects to Pg using J2EE pooled connections.
>
> PostgreSQL 8.0.3 on sparc-sun-solaris2.9, compiled by GCC sparc-sun-solaris2.8-gcc (GCC) 3.3.2
>
>Database is quite large with respect to the number of tables, some of
>which have up to 6 million tuples. Typical idle/busy connection ratio
>is 3/100 but occationally we'll catch 20 or more busy sessions.
>
>The problem manifests itself and appears like a locking issue. About
>weekly throuput slows down and we notice the busy connection count
>rising minute by minute. 2, 20, 40... Before long, the app server
>detects lack of responsiveness and fails over to another app server
>(not Pg) which in turn attempts a bunch of new connections into
>Postgres.
>
>Sampling of the snapshots of pg_locks and pg_stat_activity tables
>takes place each minute.
>
>I am wishing for a few new ideas as to what to be watching; Here's
>some observations that I've made.
>
>1. At no time do any UN-granted locks show in pg_locks
>2. The number of exclusive locks is small 1, 4, 8
>3. Other locks type/mode are numerous but appear like normal workload.
>4. There are at least a few old '<IDLE> In Transaction' cases in
> activity view
>5. No interesting error messages or warning in Pg logs.
>6. No crash of Pg backend
>
>Other goodies includes a bounty of poor performing queries which are
>constantly being optimized now for good measure. Aside from the heavy
>queries, performance is generallly decent.
>
>Resource related server configs have been boosted substantially but
>have not undergone any formal R&D to verify that we're inthe safe
>under heavy load.
>
>An max_fsm_relations setting which is *below* our table and index
>count was discovered by me today and will be increased this evening
>during a maint cycle.
>
>The slowdown and subsequent run-away app server takes place within a
>small 2-5 minute window and I have as of yet not been able to get into
>Psql during the event for a hands-on look.
>
>Questions;
>
>1. Is there any type of resource lock that can unconditionally block
> another session and NOT appear as UN-granted lock?
>
>2. What in particular other runtime info would be most useful to
> sample here?
>
>3. What Solaris side runtime stats might give some clues here
> (maybe?)( and how often to sample? Assume needs to be aggressive
> due to how fast this problem crops up.
>
>Any help appreciated
>
>Thank you
>
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Steinar H. Gunderson 2006-01-20 22:16:55 Re: [GENERAL] Creation of tsearch2 index is very slow
Previous Message Tom Lane 2006-01-20 21:50:17 Re: [GENERAL] Creation of tsearch2 index is very slow