Re: PostgreSQL 8.4.8 bringing my website down every evening

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL 8.4.8 bringing my website down every evening
Date: 2011-05-26 14:01:24
Message-ID: 4DDE5D34.2000808@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dne 26.5.2011 11:41, Alexander Farber napsal(a):
> Thank you, I'll try your suggestions.
>
> I'm just slow in doing so, because it's just a
> (sometimes pretty time consuming) hobby-project.
>
> I'm missing knowledge on how to monitor my DB status,
> i.e. how to check some of the things you've asked.

OK, let me explain in a bit more detail. Merlin recommended those 5
things to find out where the real bottleneck is (CPU, I/O, ...), because
that's the necessary first step to fix it.

1. cpu bound? check top cpu usage during

Just run "top" and see what's going on when there are problems. If
the is 100% busy then the DB is CPU bound and you have to optimize
it so that it uses less CPU (or add faster/more CPUs).

It might be that most of the CPU is consumed by other processes
(e.g. Java doing GC) but still you need to find out if it's the case.

2. i/o bound? check top wait%

Run "top" and see what is the wait time. If you have more drives,
you can run "dstat" or "iostat -x" to see "per disk" stats. If the
wait/util values grow too much (beyond 50%), you're probably I/O
bound and you need to fix this.

3. scaling issues? # active connections over 20 or so can be
dangerous. consider installing a pooler (my favorite is pgbouncer).
also monitor vmstat for context switches

Run "vmstat 1" and see the "cs" (context switch) column. The more
context switches happen, the more overhead that makes and the less
actual work can be done. So if you have too many active processes
(and each connection is a separate postgres backend process), this
may be a serious problem (unless the connections are idle).

The state of the connection can be seen from "ps ax" output - there
will be something like this:

5257 ? Ss 0:00 postgres: pguser pgdb [local] idle

which means the connection is idle, or this

5257 ? Rs 0:02 postgres: vampire pgmap [local] SELECT

when there's a query running.

Or you can use pg_stat_activity system view - the idle connections
will have "<IDLE>" in the "current_query" column.

4. lousy queries? enable min_duration_statement in logs and take note of
queries running over 20-50ms

Poor SQL queries are often the real cause - you have to find out
which queries are slow (and then you can analyze why). The queries
can be obtained in two ways.

First you can set "log_min_duration_statement" in the config file,
and queries exceeding this number of miliseconds will be written
to the postgresql log. For example this

log_min_duration_statement = 250

will log all queries that take more than 250ms. Be careful not to
set it too low (I really wouldn't set it to 20ms right now), because
it means more I/O and it might make the problem even worse. Queries
start to slow down, more and more of them exceed this threshold and
need to be written, that means more I/O and that makes more queries
to run slow - you get the idea.

Or you could use the pg_stat_activity view again. Once the problems
happen log into psql and run this

select * from pg_stat_activity where current_query != '<IDLE>'
order by (now() - query_start) desc;

and you'll get list of currently running queries sorted by time.

5. something else? when are your backups running? what else is
happening at that time?

This just means the actual problem might be outside postgresql, e.g.
an intensive batch process / backup / ... consuming a lot of CPU,
I/O or other resources. That's it.

> Also I wonder, how's shared memory used by PostgreSQL.
> I'm irritated - how it could work with 32MB,
> but now I've got suggestion to increase it
> to 512MB (and it seems to work too...)

Shared buffers are a 'database cache'. When the DB needs a block from a
file (because that's where the data are stored), it reads the data into
the cache. When the same block is needed again, it may be read from the
cache (which is much faster). Unless there's not enough space to hold
all the blocks - in that case the block may be removed from the cache
and will be read from the disk again.

So it's about size of this cache - when you increase the cache more
reads will be resolved without actual I/O. But as Merlin noted, this may
not help when there's a lot of writes in your workload. It may actually
make the thing worse during checkpoint. I'd recommend to enable
"log_checkpoints" to see if this is a problem.

regards
Tomas

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fredric Fredricson 2011-05-26 14:33:37 When is commited data available
Previous Message Merlin Moncure 2011-05-26 13:52:33 Re: Miidpoint between two long/lat points? (earthdistance?)