Re: Finding bottleneck

From: Kari Lavikka <tuner(at)bdb(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, rjpeace(at)earthlink(dot)net
Subject: Re: Finding bottleneck
Date: 2005-08-19 11:34:47
Message-ID: Pine.HPX.4.62.0508191413480.3361@purple.bdb.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 8 Aug 2005, Tom Lane wrote:
> What that sounds like to me is a machine with inadequate disk I/O bandwidth.
> Your earlier comment that checkpoint drives the machine into the ground
> fits right into that theory, too. You said there is "almost no IO-wait"
> but are you sure you are measuring that correctly?

Reducing checkpoint_timeout to 600 seconds had a positive effect. Previous
value was 1800 seconds.

We have a spare disk array from the old server and I'm planning to use it
as a tablespace for the comment table (the 100M+ rows one) as Ron
suggested.

>> Queries accumulate and when checkpointing is over, there can be
>> something like 400 queries running but over 50% of cpu is just idling.
>
> 400 queries? Are you launching 400 separate backends to do that?
> Some sort of connection pooling seems like a good idea, if you don't
> have it in place already. If the system's effective behavior in the
> face of heavy load is to start even more concurrent backends, that
> could easily drive things into the ground.

Ok, I implemented connection pooling using pgpool and it increased
performance a lot! We are now delivering about 1500 dynamic pages a second
without problems. Each of the eight single-cpu webservers are running a
pgpool instance with 20 connections.

However, those configuration changes didn't have significant effect to
oprofile results. AtEOXact_CatCache consumes even more cycles. This isn't
a problem right now but it may be in the future...

CPU: AMD64 processors, speed 2190.23 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples % symbol name
1147870 21.1602 AtEOXact_CatCache
187466 3.4558 hash_seq_search
174357 3.2142 AllocSetAlloc
170896 3.1504 nocachegetattr
131724 2.4282 ExecMakeFunctionResultNoSets
125292 2.3097 SearchCatCache
117264 2.1617 StrategyDirtyBufferList
105741 1.9493 hash_search
98245 1.8111 FunctionCall2
97878 1.8043 yyparse
90932 1.6763 LWLockAcquire
83555 1.5403 LWLockRelease
81045 1.4940 _bt_compare
... and so on ...

----->8 Signigicant rows from current postgresql.conf 8<-----

max_connections = 768 # unnecessarily large with connection
pooling
shared_buffers = 15000
work_mem = 2048
maintenance_work_mem = 32768
max_fsm_pages = 1000000
max_fsm_relations = 5000
bgwriter_percent = 2
fsync = true
wal_buffers = 512
checkpoint_segments = 200 # less would probably be enuff with 600sec
timeout
checkpoint_timeout = 600
effective_cache_size = 500000
random_page_cost = 1.5
default_statistics_target = 150
stats_start_collector = true
stats_command_string = true

|\__/|
( oo ) Kari Lavikka - tuner(at)bdb(dot)fi - (050) 380 3808
__ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _
""

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2005-08-19 12:00:26 Re: Need for speed
Previous Message Bob Ippolito 2005-08-19 11:28:07 Re: sustained update load of 1-2k/sec