Skip site navigation (1) Skip section navigation (2)

Re: Scaling concerns

From: Andreas Kostyrka <andreas(at)kostyrka(dot)org>
To: tsuraan <tsuraan(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Scaling concerns
Date: 2006-12-17 03:25:27
Message-ID: 20061217032527.GL29195@andi-lap.la.revver.com (view raw or flat)
Thread:
Lists: pgsql-performance
* tsuraan <tsuraan(at)gmail(dot)com> [061216 18:26]:
> I'm writing a webmail-type application that is meant to be used in a
> corporate environment.  The core of my system is a Postgres database
> that is used as a message header cache.  The two (relevant) tables
> being used are pasted into the end of this message.  My problem is
> that, as the messages table increases to tens of millions of rows,
> pgsql slows down considerably.  Even an operation like "select
> count(*) from messages" can take minutes, with a totally idle system.
> Postgres seems to be the most scalable Free database out there, so I
> must be doing something wrong.

select count(*) from table is the worst case in PostgreSQL. (MVC
systems in general I guess).

If you really need to run count(*) you need to think about the
required isolation level of these operations and make some aggregate
table yourself.

(btw, select aggregate(*) from bigtable is something that no database
likes, it's just the degree of slowness that sometimes is different).

For scaling you should consider slony. Either hangout on #slony on
Freenode.net or ask on the mailing list if you have questions.

> As for the most common strategy of having a slower (more rows)
> "archival" database and a smaller, faster "live" database, all the
> clients in the company are using their normal corporate email server
> for day-to-day email handling.  The webmail is used for access email
> that's no longer on the corporate server, so it's not really simple to
> say which emails should be considered live and which are really
> out-of-date.
> 
> My postgres settings are entirely default with the exception of
> shared_buffers being set to 40,000 and max_connections set to 400.
> I'm not sure what the meaning of most of the other settings are, so I
> haven't touched them.  The machines running the database servers are
> my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB
> SATA II drive), and a production server with two dual-core Intel

Intel chips => define more. There are Intel boxes known to have issues
under specific load scenarios with PostgreSQL (again specific
versions). To make it funnier, these are really really hard to track
down ;)

> chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550
> controller.  Both machines are running Gentoo Linux with a 2.6.1x
> kernel, and both exhibit significant performance degradation when I
> start getting tens of millions of records.
> 
> Any advice would be most appreciated.  Thanks in advance!

Cluster. One box that applies changes, and multiple boxes that read
the data.

If you cannot afford multiple boxes from the start, design your
application still to work with two connections: one connection to a
user with read/write permissions, and one connecting to a user having
only select permissions => this way you can later easily add a
loadbalancer to the mix, and use multiple postgres boxes for reading
stuff.

Andreas

In response to

Responses

pgsql-performance by date

Next:From: Luke LonerganDate: 2006-12-17 07:18:25
Subject: Re: Scaling concerns
Previous:From: Dimitri FontaineDate: 2006-12-16 21:32:59
Subject: Re: Scaling concerns

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group