Re: Low Budget Performance

From: eric soroos <eric-psql(at)soroos(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Low Budget Performance
Date: 2002-10-29 18:43:33
Message-ID: 78755096.1176234283@[4.42.179.151]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Josh,

Thanks for the reply.

> One gig is a large database for a single IDE drive -- especially with
> multiple client connections.

That's good to know.

Is a scsi system that much better? Looking at prices, scsi is 1/2 the capacity and double the price for the 80 gig 7200rpm ide vs 36 gig 10k rpm scsi. Assuming that I'll never run out of space before running out of performance, I can dedicate 2x the number of ide drives to the problem.

> > Well, IDE RAID looks like nice optio to me, but before finalising
> > RAID config.,
> > I would advice to test performance and scalability with separate
> > database
> > server and couple of Gigs of RAM.
>
> I'm not convinced that current IDE RAID actually improves database disk
> throughput -- there's a lot of overhead in the one controller I tried
> (Promise). Does anyone have some statistics they can throw at me?

All of the benchmarks that I've seen show that IDE raid is good for large operations, but for random seek and small data transfers, you don't get anywhere near the expected scaling.

> A cheaper and easier method, involving 3-4 disks:
>
> Channel 1, Disk 1: Operating System, Swap, and PostgreSQL log
> Channel 1, Disk 2: WAL Files
> Channel 2, Disk 1: Database
> Channel 2, Disk 2 (optional): 2nd database data

With IDE, I think I can manage to put each drive on a seperate channel. I've either got one extra controller onboard, or I can add a 4 channel pci card. From what I've read, this is one of the more important factors in IDE performance.

> *however*, if you have multiple databases being simulteaneously
> accessesed, you will want to experiment with shuffling around the
> databases and WAL files to put them on different disks. The principle
> is to divide the disk tasks that are simultaenous ammonng as many disks
> as possible; thus the WAL files always do better on a different disk
> and channel than the database.

That's what I've read about database disk system design. Reduce spindle contention by using lots of drives. (especially in Philip Greenspun's book, but he's talking about 2x7 drives as a minimal configuration and 2x21 as ideal for larger systems. And when licensing is more expensive than that sort of drive system, it's all roundoff error.)

So, assuming that I have three databases with roughly equal load on them, does it make sense to partition them like:

disk 0: os/swap/log/backup staging
disk 1: WAL 1, DB 2
disk 2: WAL 2, DB 3
disk 3: WAL 3, DB 1

Or, in a slightly bigger drive system split 2 ways then mirrored.
disk 0: os etc
Disk 1,2: WAL 1, DB 2
Disk 3,4: WAL 2, DB 1

From an admin point of view, would this be done with alternate locations, symlinks, or multiple concurrent pg processes?

> > > Does the write ahead logging of PG mean that no matter what indexes
> > and data are changed, that there will be one sync to disk? Does this
> > reduce the penalty of indexes?
>
> In a word: No. Depending on the size of the update, there may be
> multiple synchs. And indexes do carry a significant penalty on large
> updates; just try runninng 10,000 updates to an indexed column as one
> transaction, and the penalty will be obvious. In fact, for my data
> load procedures, I tend to drop and re-create indexes.

Most of my update procedures are single row updates, with the exception being things that are already background tasks that the user doesn't notice the difference between 10 and 20 sec. So maybe I'm lucky there.

> Mirrored drives are different than RAID. However, you are correct
> that the redundancy/fail-over factor in some RAID and Mirroring comes
> at a performance penalty.

From howtos I've seen, there _can_ be a speed boost with mirroring on read using the linux kernel raid 1. Write performance suffers though.

> But you need to determine where you are actually losing time.

That looks like it will get me started.

eric

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2002-10-29 20:13:41 Re: Low Budget Performance
Previous Message Andrew Sullivan 2002-10-29 17:31:10 Re: Low Budget Performance