[Paul Lathrop - Thu at 02:59:27PM -0800]
> growing disk space usage. The DBA had come to the conclusion that the
> VACUUM command did/does not work on these systems, because even after a
> VACUUM FULL, the size of the database was continually increasing. So, as
> things stand with the PG7.2 machines, vacuuming is run nightly, and
> whenever the database size reaches 40Gb on disk (the point at which
> performance has degraded below tolerance), the DBA exports the data,
> deletes the database, and then imports the data, shrinking it to the
> actual size of the dataset.
We found one reason why vacuuming didn't always work for us - we had
long running transactions - in addition to killing the vacuum, it did
really nasty things to the performance in general.
To check for those transactions, I think it's needed to turn on
stats_command_string in the config.
I use this query to check:
select * from pg_stat_activity where current_query<>'<IDLE>' order by
If you spot any "<IDLE> in transaction" with an old query_start
timestamp, then that's most probably the reason.
Long running transactions doesn't have to be idle ... check the pg_locks
view for the lowest transactionid and compare (through the pid) with the
pg_stat_activity view to find the actual backend.
> However, we still are suffering a gradual decrease in performance over
> time - or so the application engineers claim. The DBA and I have been
> banging our heads against this for a month.
We're having the same issues, so we do the dumping and restoring every
now and then to be sure everything is properly cleaned up. With 8.1.
> 1) How does one define 'performance' anyway? Is it average time to
> complete a query? If so, what kind of query? Is it some other metric?
We have the same kind of problem, and the project leader (I sometimes
refer him as the "bottleneck" ;-) is most concerned about iowait at our
cpu graphs. Anyway, we do have other measures:
- our applications does log the duration of each request towards the
application as well as each query towards the database. If the
request (this is web servers) is taking "too long" time, it's logged
as error instead of debug. If a significant number of such errors
is due to database calls taking too much time, then the performance
is bad. Unfortunately, we have no way to automate such checking.
- I've setting up two scripts pinging that pg_stat_activity view every
now and then, logging how much "gruff" it finds there. Those two
scripts are eventually to be merged. One is simply logging what it
finds, the other is a plugin system to the Munin graphing package.
I've thrown the scripts we use out here:
(I had to rename them to .txt to get the web server to play along).
Those are very as-is, should certainly be modified a bit to fit to any
other production environment. :-)
The pg_activity_log dumps a single number indicating the "stress level"
of the database to a file. I think this stress number, when taking out
i.e. the 20% worst numbers from the file for each day, can indicate
something about the performance of the database server. However, I
haven't had the chance to discuss it with the bottleneck yet.
In response to
pgsql-performance by date
|Next:||From: Tom Lane||Date: 2006-12-01 00:26:36|
|Subject: Re: Defining performance. |
|Previous:||From: Paul Lathrop||Date: 2006-11-30 22:59:27|
|Subject: Defining performance.|