Thomas Pöhler<tp(at)turtle-entertainment(dot)de> wrote:
> we are using two instances of pgbouncer v1.4 for connection
> pooling. One for prepared statements (pool_mode session) and one
> without (pool_mode transaction).
> max_client_conn = 10000
> default_pool_size = 450
Your best defense against the "thundering herd" issues you describe
would be to eliminate the session pool (if you can), and drop the
default_pool_size for the transaction pool to where at peak the
number of backends actually busy is about twice your number of
*actual* cores. (Don't count hyperthreading "logical" cores for
this purpose.) max_client_conn can be as high as you need; the
point is for the connection pool to funnel the requests through a
much smaller pool of database connections.
> If I remember correctly vmstat showed lots of context switches
> during a peak above 50k.
Yeah, that's part of the reason throughput tanks when your active
connection count gets too high.
> We are running a biweekly downtime where we do a complete reindex
> and vacuum full. We cannot identify certain queries causing this.
If you really get bloat which requires VACUUM FULL, tracking down
the reason should be a high priority. You normally shouldn't need
to run that.
Also, I hope when you run that it is VACUUM FULL followed by
REINDEX, not the other way around. In fact, it would probably be
faster to CLUSTER (if you have room) or drop the indexes, VACUUM
FULL, and then create the indexes again.
> The last graph in ganglia
> (http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg) shows the
> avg_queries from pgbouncers stats. I think this is a symptom of
> many waiting queries which accumulate.
While it seems counter-intuitive, you're likely to have fewer
queries waiting a long time there if you reduce
default_pool_size so that contention doesn't kill performance when
the queries *do* get to run.
> max_connections 1000
This is what you need to try to reduce.
> max_prepared_transactions 5
If you're actually using prepared transactions, make sure none are
lingering about for a long time during these incidents. Well,
*ever*, really -- but I would definitely check during problem
> wal_buffers 1MB
You should bump this to 16MB.
> The database is fitting completely into ram
Then you should probably be adjusting sequential_page_cost and
rand_page_cost. You'll probably get plans which run faster, which
should help overall.
In response to
pgsql-performance by date
|Next:||From: Greg Smith||Date: 2011-02-16 20:36:01|
|Subject: Re: high user cpu, massive SELECTs, no io waiting problem|
|Previous:||From: Bob Lunney||Date: 2011-02-16 19:20:27|
|Subject: Re: Really really slow select count(*)|