Re: PostgreSQL 8.4.8 bringing my website down every evening

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL 8.4.8 bringing my website down every evening
Date: 2011-05-26 14:39:06
Message-ID: BANLkTinyicaqri6qhTiOd6Ls1+Lp8unG4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, May 26, 2011 at 9:01 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> Dne 26.5.2011 11:41, Alexander Farber napsal(a):
>> Thank you, I'll try your suggestions.
>>
>> I'm just slow in doing so, because it's just a
>> (sometimes pretty time consuming) hobby-project.
>>
>> I'm missing knowledge on how to monitor my DB status,
>> i.e. how to check some of the things you've asked.
>
> OK, let me explain in a bit more detail. Merlin recommended those 5
> things to find out where the real bottleneck is (CPU, I/O, ...), because
> that's the necessary first step to fix it.
>
> 1. cpu bound? check top cpu usage during
>
>   Just run "top" and see what's going on when there are problems. If
>   the is 100% busy then the DB is CPU bound and you have to optimize
>   it so that it uses less CPU (or add faster/more CPUs).
>
>   It might be that most of the CPU is consumed by other processes
>   (e.g. Java doing GC) but still you need to find out if it's the case.
>
> 2. i/o bound? check top wait%
>
>   Run "top" and see what is the wait time. If you have more drives,
>   you can run "dstat" or "iostat -x" to see "per disk" stats. If the
>   wait/util values grow too much (beyond 50%), you're probably I/O
>   bound and you need to fix this.
>
> 3. scaling issues? # active connections over 20 or so can be
> dangerous.  consider installing a pooler (my favorite is pgbouncer).
> also monitor vmstat for context switches
>
>   Run "vmstat 1" and see the "cs" (context switch) column. The more
>   context switches happen, the more overhead that makes and the less
>   actual work can be done. So if you have too many active processes
>   (and each connection is a separate postgres backend process), this
>   may be a serious problem (unless the connections are idle).
>
>   The state of the connection can be seen from "ps ax" output - there
>   will be something like this:
>
>      5257 ?        Ss     0:00 postgres: pguser pgdb [local] idle
>
>   which means the connection is idle, or this
>
>      5257 ?        Rs     0:02 postgres: vampire pgmap [local] SELECT
>
>   when there's a query running.
>
>   Or you can use pg_stat_activity system view - the idle connections
>   will have "<IDLE>" in the "current_query" column.
>
> 4. lousy queries? enable min_duration_statement in logs and take note of
> queries running over 20-50ms
>
>   Poor SQL queries are often the real cause - you have to find out
>   which queries are slow (and then you can analyze why). The queries
>   can be obtained in two ways.
>
>   First you can set "log_min_duration_statement" in the config file,
>   and queries exceeding this number of miliseconds will be written
>   to the postgresql log. For example this
>
>      log_min_duration_statement = 250
>
>   will log all queries that take more than 250ms. Be careful not to
>   set it too low (I really wouldn't set it to 20ms right now), because
>   it means more I/O and it might make the problem even worse. Queries
>   start to slow down, more and more of them exceed this threshold and
>   need to be written, that means more I/O and that makes more queries
>   to run slow - you get the idea.
>
>   Or you could use the pg_stat_activity view again. Once the problems
>   happen log into psql and run this
>
>      select * from pg_stat_activity where current_query != '<IDLE>'
>                                  order by (now() - query_start) desc;
>
>   and you'll get list of currently running queries sorted by time.
>
> 5. something else? when are your backups running?  what else is
> happening at that time?
>
>   This just means the actual problem might be outside postgresql, e.g.
>   an intensive batch process / backup / ... consuming a lot of CPU,
>   I/O or other resources. That's it.

Great stuff.

>> Also I wonder, how's shared memory used by PostgreSQL.
>> I'm irritated - how it could work with 32MB,
>> but now I've got suggestion to increase it
>> to 512MB (and it seems to work too...)
>
> Shared buffers are a 'database cache'. When the DB needs a block from a
> file (because that's where the data are stored), it reads the data into
> the cache. When the same block is needed again, it may be read from the
> cache (which is much faster). Unless there's not enough space to hold
> all the blocks - in that case the block may be removed from the cache
> and will be read from the disk again.

*or the disk cache*. lowering shared buffers does not lower the
amount of ram in the system and thus does not lower the availability
of cache. If I may nitpick this point on your otherwise very
excellent email, this is exactly the type of thing that drives me
crazy about advice to raise shared buffers. It suggests you will get
less disk i/o which may or may not be the case (in fact, it can make
the i/o problem worse). If it does help i/o, it will probably not be
for the reasons you suspect. See my thread in -performance on this
topic.

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2011-05-26 14:55:13 Re: When is commited data available
Previous Message Fredric Fredricson 2011-05-26 14:33:37 When is commited data available