Re: Increasing number of PG connections.

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Kevin Barnard <kbarnard(at)speedfc(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Increasing number of PG connections.
Date: 2004-02-02 23:45:42
Message-ID: Pine.LNX.4.33.0402021631120.20659-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 2 Feb 2004, Kevin Barnard wrote:

> On 2 Feb 2004 at 13:58, scott.marlowe wrote:
>
> > what do you mean at 2 GB? Is that how much is in kernel cache plus
> > buffer, plus used, plus etc??? Could you give us the top of top output to
> > make sure? If most of that is kernel cache, then that's fine.
>
> 2GB was total system memory. We upgraded to 4GB to prior to increasing the
> number of connections.

Oh, ok. I thought you meant the system was using 2 gigs of RAM for
postgresql

> Here's the top of top
>
> 16:14:17 up 2 days, 16:15, 1 user, load average: 7.60, 6.56, 4.61
> 730 processes: 721 sleeping, 9 running, 0 zombie, 0 stopped
> CPU states: cpu user nice system irq softirq iowait idle
> total 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> cpu00 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> cpu01 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> cpu02 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> cpu03 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> Mem: 3747644k av, 3298344k used, 449300k free, 0k shrd, 147880k buff
> 2158532k active, 760040k inactive
> Swap: 1048088k av, 0k used, 1048088k free 2262156k cached

when you have a high load but load CPU usage, you are usually I/O bound.

> The DB is pretty close to max connections at this point in time. I don't know why
> CPU shows 0% in every bucket. It looks like I can increase the number of
> connections a little from here. This is a fairly standard Fedora install. It's using
> version 2.4.22 of the Kernel. Postgres is a complied version using 7.4.1

On this machine you could probably handle even more. What I want is to
get your page return times down enough so you don't need to increase the
number of connections. I.e. if you've got 2 second response times and you
drop those to 0.2 seconds, then you won't need as many processes to handle
the load (theoretically... :-)

> > experience has been that individual postgresql backends only weigh in at a
> > mega byte at most, and they share buffer, so 700 connections can be
> > anywhere from 300meg to 1 gig. the rest would be buffer memory. It's not
> > a good idea to give up too much to shared buffers, as the database isn't
> > as good at caching as the kernel.
>
> OK I take this as I should keep shared buffers around 2x connections then correct?

Not really. What happens is that if the shared buffers are so large that
they are as large as or god forbid, larger than the kernel cache, then the
kernel cache becomes less effective. The general rule of thumb is 25% of
memory, or 256 Megs, whichever is less. The real test is that you want
enough shared_buffers so that all the result sets currently being smooshed
up against each other in joins, sorts, etc... can fit in postgresql's
shared buffers, or at least the buffers can hold a fair chunk of it. So,
the number of buffers can be anywhere from a few thousand, up to 40000 or
50000, sometimes even higher. But for most tuning you won't be needing to
be above 32768, which is 256 Megs of ram.

> > What do you have in postgresql.conf? sort_mem, shared_buffers, etc???
>
> Here is what I have that is not set from the defaults.
>
> max_connections = 700
> shared_buffers = 1500
> sort_mem = 512
> random_page_cost = 2
> stats_start_collector = true
> stats_command_string = true
> stats_block_level = true
> stats_row_level = true
>
>
> > sort_mem can be a real killer if it lets the processes chew up too much
> > memory. Once sort_mem gets high enough to make the machine start swapping
> > it is doing more harm than good being that high, and should usually be
> > lowered a fair bit.
>
> I dropped it down to 512 as you can see. Should I be running with all of the stats on?
> I am no longer using pg_autovacuum. I seem to be getting better results with an
> hourly Vacuum anaylse.

Seeing as how top shows 2262156k kernel cache, you can afford to give up a
fair bit more than 512k per sort. I generally run 8192 (8 meg) but I
don't handle 700 simos. Try running it a little higher, 2048, 4096,
etc... and see if that helps. Note you can change sort_mem and just do a
pg_ctl reload to make the change, without interrupting service, unlike
shared_buffers, which requires a restart.

> > How many disks in your RAID5? The more the better. Is it hardware with
> > battery backed cache? If you write much to it it will help to have
> > battery backed cache on board. If it's a megaraid / LSI board, get the
> > megaraid2 driver, it's supposedly much faster.
>
> 4 disk IBM ServeRAID 5i with battery backed cache.

Do you have the cache set to write back or write through? Write through
can be a performance killer. But I don't think your RAID is the problem,
it looks to me like postgresql is doing a lot of I/O. When you run top,
do the postgresql processes show a lot of D status? That's usually waiting
on I/O

> > You may find it hard to get postgresql to use any more memory than you
> > have, as 32 bit apps can only address 2 gigs anyway, but the extra can
> > certainly be used by the kernel as cache, which will help.
>
> Isn't that only true for each indivdual process space. Shouldn't each process have
> access at most 2GB. If each backend is in it's own process space is this really a limit
> since all of my queries are pretty small.

Right, each process can use a big chunk, but shared_buffers will top out
at ~2 gig. Most tests have shown a negative return on a shared_buffers
setting that big though, so the nicest thing about the extra memory is
that the kernel can use it to cache, AND you can increase your sort_mem to
something larger.

> I have been monitoring the system has it gets up to load. For most of the time the
> sytem sits around 100-300 connections. Once it ramps up it ramps up hard. Top
> starts cycling at 0 and 133% CPU for irq, softirq and iowait. The system stays at 700
> connections until users give up. I can watch bandwidth utilization drop to almost
> nothing right before the DB catches up.

Yeah, it sounds to me like it's grinding down to a halt because postgresql
isn't being able to hold enough data in memory for what it's doing.

Try increasing shared_buffers to 5000 to as high as 30000 or 50000, a but
at a time, as well as increasing sort_mem to 4096 or 8192. Note that
increasing shared_buffers will have a very positive effect on performance
at first, then less effect, then slowly bring it back down as it goes too
high, but isn't likely to starve the machine (64k buffers = 512 meg,
you've got the memory to spare, so no great loss).

however, sort_mem will have a huge effect right up until it's big enough
for all your sorts to fit into memory. once that happens, increasing it
won't help or hurt UNTIL the machine gets enough load to make the sorts
use up all memory and send it into a swap storm, so be careful about
overdoing sorts.

I.e. shared_buffers = too big, no big deal, sort_mem too big = time bomb.

what you want to do is get the machine to a point where the kernel cache
is about twice the size or larger, than the shared_buffers. I'd start at
10000 shared buffers and 4096 sort mem and see what happens. If you've
still got >2 gig kernel cache at that point, then increase both a bit (2x
or so) and see how much kernel cache you've got. If your kernel cache
stays above 1Gig, and the machine is running faster, you're doing pretty
good.

you may need to increase shmmax and friends to increase the shared_buffers
that high, but sort_mem requires no new kernel configuration.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Kings-Lynne 2004-02-03 00:44:32 Re: MySQL+InnoDB vs. PostgreSQL test?
Previous Message Kevin Barnard 2004-02-02 23:10:43 Re: Increasing number of PG connections.