Low Budget Performance

From: eric soroos <eric-psql(at)soroos(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Low Budget Performance
Date: 2002-10-28 21:56:14
Message-ID: 74264850.1176309122@[4.42.179.151]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm looking for some advice or benchmarks comparing low end systems for a postgres installation.

Currently, I've got postgres running on the same system as the app server accessing a single fast IDE drive. The database is on the order of 1 gig, with two main tables accounting for 98% of the data. Between the app servers and the database, I'm pretty sure that neither of the main tables are cached in memory for any significant time. I'm guessing that this is sub optimal. (The data size is 1 gig now, but I will be adding more 1 gig databases to this system in the near future) I'm planning to split this into an app server and database server.

In an ideal world, I'd throw a lot of 15k scsi/raid0+1 at this. But I don't have an ideal world budget. I've got more of an ide world budget, if that. (~1k)

I know the first order of business is to ignore the hardware and make sure that I've got all of the table scans found and turned into indexes. I'm still working on that. Are there any tools that save queries and the plans, then report on the ones that are the biggest performance drags?

But since I do software, it's obviously a hardware problem. ;>

My hardware options:

Processor:

* My low cost option is to repurpose an under used p3 with onboard IDE raid and pc133 memory. The high cost option is to get a new mid range Athalon with 266/ddr memory. Will maxing out the memory mean that whatever I don't use for client connections will be used for caching the drive system? (most likely, I will be running debian woody with a 2.4 series kernel)

Drives:

* The cost difference between IDE and SCSI is roughly a factor of 2-4x. (100 gig 7200 rpm IDE can be had for a little over $100, 10k 36 gig SCSI is about $200. Am I better off throwing twice as many (4) IDE disks at the system? Does it change if I can put each IDE drive on its own channel?

Drive Layout:

* What Drive layout?

Raid?
0 gives better latency if the controller reads from whichever gets the data first. It's unclear if IDE or software raid actually does this though.
1 Gives better throughput, at a cost to latency.
5 Like 1 but with redundancy. It's unclear if I'll be able to do this without hardware SCSI raid.

Non Raid?
I've read about seperating table spaces on different drives, so that indexes and data can be written at the same time. This advice appears to be tailored to the complexity of oracle. The ideal configuration according to this info appears to be multiple drives, all mirrored individually.

Does the write ahead logging of PG mean that no matter what indexes and data are changed, that there will be one sync to disk? Does this reduce the penalty of indexes? WAL seems to mean that to get performance out of a drive array, I'd want to use the fastest (latency/throughput) logical single image I could get, not a collection of mirrored drives.

I'd appreciate any insight.

eric

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Carlos Moreno 2002-10-28 22:07:42 Re: Setting shared buffers
Previous Message Andrew Sullivan 2002-10-28 21:32:16 Re: Clusters