Re: Help with performance problems

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Chris Hoover <revoohc(at)sermonaudio(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Help with performance problems
Date: 2004-04-23 19:58:51
Message-ID: Pine.LNX.4.33.0404231326490.27417-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 23 Apr 2004, Chris Hoover wrote:

> On Friday 23 April 2004 13:21, scott.marlowe wrote:
> > On Fri, 23 Apr 2004, Chris Hoover wrote:
> > > DB's on Powervaults 220S using raid 5 (over 6 disks)
> >
> > What controller is this, the adaptec? We've found it to be slower than
> > the LSI megaraid based controller, but YMMV.
> >
> We are using the perc3/di controller. Believe it is using the megaraid
> driver.

No, that's the adaptec, the PERC3/DC is the lsi megaraid. See if there
are newer drivers for the RAID card. In terms of performance, the adaptec
and lsi drivers have improved considerably in later versions. In terms of
stability they've largely gotten better with a few in between releases on
the megaraid getting poor grades. The latest / greatest from Dell is
pretty up to date.

> > > Running RH ES 2.1
> >
> > Are you running the latest kernel for ES 2.1? Early 2.4 kernels are
> > pretty pokey and have some odd behaviour under load that later 2.4
> > kernels seemed to fix.
> >
> I'm not sure we are at the latest and greatest for 2.1, but I am trying to get
> there. Management won't let me do the upgrade w/o first testing/proving it
> will not cause any more issues. Due to all of the current issues, and the
> criticality of these systems to our bottom line, they are being very careful
> with any change that may impact our users further.

Understood. It's why my production box is still running a 2.4 kernel on
rh 7.2 with pg 7.2. They just work, but for us stability AND performance
are both good with our load.

You can install a new kernel and set up the machine to still boot off of
the old one, and test on the weekend to see how it behaves under
simulated load. Mining the logs for slow queries is a good way to build
one.

while we don't upgrade our production server's applications to the latest
and greatest all the time (i.e. php or postgresql or openldap) we always
run the latest security patches, and I think the latest kernels had
security fixes for ES 2.1, so NOT upgrading it dangerous. Late model
linux kernels (the 2.0.x and 2.2.x where x>20) tend to be VERY stable and
very conservatively backported and upgraded, so running a new one isn't
usually a big risk.

> > > Here is the postgresql.conf from the server with the 11GB db:
> > >
> > > max_connections = 64
> > > shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff)
> > > max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes
> > > max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes
> >
> > IF you're doing lots of updates and such, you might want these higher.
> > Have you vacuumed full the databases since taking over?
> >
> > > sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns)
> >
> > Sorry, that's wrong. sort_mem is measure in kbytes. i.e. 8192 means 8
> > megs sort_mem. Try setting it a bit higher (you've got LOTS of ram in
> > these boxes) to something like 16 or 32 meg.
> >
> > > checkpoint_segments = 16 # in logfile segments, min 1, 16MB each
> > > checkpoint_timeout = 30 # range 30-3600, in seconds
> > > effective_cache_size = 131072 # typically 8KB each
> >
> > This still looks low. On one machine you're showing kernel cache of about
> > .7 gig, on the other it's 6 gig. 6 gigs of kernel cache would be a
> > setting of 800000. It's more of a nudge factor than an exact science, so
> > don't worry too much.
> I believe changing this requires a restart of the cluster (correct?). If so,
> I'll try bumping up the effective_cache_size over the weekend.
>
> Also, will all of the memory available to these machines, should I be running
> with larger shared_buffers? It seems like 256M is a bit small.

No, you probably shouldn't. PostgreSQL doesn't "cache" in the classical
sense. If all backends close, the stuff they had in their buffers
disappears in a flash. So, it's generally considered better to let the
kernel do the bulk of the caching, and having the buffer area be large
enough to hold a large portion, if not all, of your working set of data.
But between the cache management which is dirt simple and works but seems
to have performance issues with large numbers of buffers, and the fact
that all the memory in it disappears when the last backend using it.

for instance, in doing the following seq scan select:

explain analyze select * from test;

where test is a ~10 megabyte table, the first time I ran it it took 5
seconds to run. The second time took it 2.5, the third 1.9, and it
levelled out around there. Starting up another backend and running the
same query got a 1.9 second response also. Shutting down both
connections, and running the query again, with only the kernel for
caching, I got 1.9.

That's on a 2.4.2[2-4] kernel.

> > If you've got fast I/O look at lowering random page cost to something
> > between 1 and 2. We use 1.3 to 1.4 on most of our machines with fast
> > drives under them.
> >
> > I'd use vmstat to see if you're I/O bound.
> >
> If we end up being I/O bound, should the random page cost be set higher?

Not necessarily. Often times on a machine with a lot of memory, you are
better off using index scans where disk seek time would be expensive, but
with indexes in ram, the page cost in comparison to seq pages is almost 1,
with a slight overhead cost. So, lowering the random page cost favors
indexes, generally. If your I/O subsystem is doing a lot of seq scans,
when only part of the data set is ever really being worked on, this tends
to flush out the kernel cache, and we wind up going back to disk over and
over. On the other hand, if your data is normally going to be
sequentially accessed, then you'll have to invest in better RAID hardware
/ more drives etc...

but with 12 gigs on one box, and an already reasonably fast I/O subsystem
in place, I'd think a lower random page cost would help, not hurt
performance.

Have you explain analyzed your slower queries?

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ron St-Pierre 2004-04-23 21:36:31 Re: Help with performance problems
Previous Message Chris Hoover 2004-04-23 19:27:24 Re: Help with performance problems