Skip site navigation (1) Skip section navigation (2)

Scaling concerns

From: tsuraan <tsuraan(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Scaling concerns
Date: 2006-12-16 17:26:02
Message-ID: 84fb38e30612160926l609ec55cs352b2495289bae0c@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
I'm writing a webmail-type application that is meant to be used in a
corporate environment.  The core of my system is a Postgres database
that is used as a message header cache.  The two (relevant) tables
being used are pasted into the end of this message.  My problem is
that, as the messages table increases to tens of millions of rows,
pgsql slows down considerably.  Even an operation like "select
count(*) from messages" can take minutes, with a totally idle system.
Postgres seems to be the most scalable Free database out there, so I
must be doing something wrong.

As for the most common strategy of having a slower (more rows)
"archival" database and a smaller, faster "live" database, all the
clients in the company are using their normal corporate email server
for day-to-day email handling.  The webmail is used for access email
that's no longer on the corporate server, so it's not really simple to
say which emails should be considered live and which are really
out-of-date.

My postgres settings are entirely default with the exception of
shared_buffers being set to 40,000 and max_connections set to 400.
I'm not sure what the meaning of most of the other settings are, so I
haven't touched them.  The machines running the database servers are
my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB
SATA II drive), and a production server with two dual-core Intel
chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550
controller.  Both machines are running Gentoo Linux with a 2.6.1x
kernel, and both exhibit significant performance degradation when I
start getting tens of millions of records.

Any advice would be most appreciated.  Thanks in advance!

Tables:

CREATE TABLE EmailAddresses (
  emailid   SERIAL PRIMARY KEY,     --  The unique identifier of this address
  name      TEXT NOT NULL,          --  The friendly name in the address
  addrspec  TEXT NOT NULL,          --  The user(at)domain part of the address
  UNIQUE(name, addrspec)
);

and

CREATE TABLE Messages (
  -- Store info:
  msgkey    BIGSERIAL PRIMARY KEY,  --  Unique identifier for a message
  path      TEXT NOT NULL,          --  Where the message is on the file system
  inserted  TIMESTAMP DEFAULT now(),--  When the message was fetched
  -- Message Info:
  msgid     TEXT UNIQUE NOT NULL,   --  Message's Message-Id field
  mfrom     INTEGER                 --  Who sent the message
            REFERENCES EmailAddresses
            DEFAULT NULL,
  mdate     TIMESTAMP DEFAULT NULL, --  Message "date" header field
  replyto   TEXT DEFAULT NULL,      --  Message-ID of replied-to message
  subject   TEXT DEFAULT NULL,      --  Message "subject" header field
  numatch   INTEGER DEFAULT NULL,   --  Number of attachments
  UNIQUE(path)
);

Responses

pgsql-performance by date

Next:From: Steinar H. GundersonDate: 2006-12-16 17:32:21
Subject: Re: Scaling concerns
Previous:From: Steinar H. GundersonDate: 2006-12-16 16:04:13
Subject: Re: New to PostgreSQL, performance considerations

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group