Re: performance issues for processing more then 150000

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Jakab Laszlo" <jakablaszlo(at)sofasoft(dot)ro>, "Justin Clift" <justin(at)postgresql(dot)org>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance issues for processing more then 150000
Date: 2003-02-23 20:44:06
Message-ID: 200302231244.06242.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jakab,

Some simple tips, which is what I thing you were really asking for:

Are you adding the records in a single overnight or downtime load batch? If
so, the fastest way by far is to:
1) disable all triggers and constraints on the table temporarily, and some or
all of the indexes
2) put all the data into a COPY file (tab-delimited text; see COPY docs)
3) load the data through a COPY statement
4) re-enable the trigger and constraints and re-build the indexes
The degree to which you need to do 1) and 4) depends on how much punishment
your system can take; start out by dropping and rebuilding just the triggers
and move up from there until the load finishes in a satisfactory time.

If the records are being added on a continuous basis and not in a big batch,
then take the following precautions:
1) put as many inserts as possible into transaction batches
2) minimize your use of constraints, triggers, and indexes on the tables being
loaded
3) consdier using a "buffer table" to hold records about to be loaded while
data integrity checks and similar are performed.

> Unfortunatelly the hardware budget should be keept as low as possible.
> I was thinking is there could be reliable solution based on dual processor
> and ATA 133 raid mirroring normally with some gigs of memory.

1 gig of RAM may be plenty. Your main bottleneck will be your disk channel.
If I were setting up your server, I might do something like:

1) buy a motherboard with 4 ATA controllers.
2) put disks like:
channel 0: 1 matched pair disks
channel 1 + 2: 1 matched quartet of disks
channel 3: single ATA disk
(for Postgresql, more, smaller disks is almost always better than a few big
ones.) (alternately, set up everythin in one big RAID 5 array with at least 6
disks. There is argument about which is better)
3) Format the above as a RAID 1 pair on channel 0 and a RAID 1+0 double pair
on channel 1 using Linux software RAID
4) Put Linux OS + swap on channel 0. Put the database on channel 1+2. Put
the pg_xlog (the transaction log) on channel 3. Make sure to use a version
of Linux with kernel 2.4.19 or greater!

That's just one configuration of several possible, of course, but may serve
your purposes admirably and relatively cheaply.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-02-23 20:52:30 Re: slow query
Previous Message Tom Lane 2003-02-22 04:23:47 Re: Really bad insert performance: what did I do wrong?