Re: Hardware recommendations to scale to silly load

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hardware recommendations to scale to silly load
Date: 2003-08-27 02:59:06
Message-ID: 1061953146.18108.291.camel@haggis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, 2003-08-26 at 20:35, matt wrote:
> I'm wondering if the good people out there could perhaps give me some
> pointers on suitable hardware to solve an upcoming performance issue.
> I've never really dealt with these kinds of loads before, so any
> experience you guys have would be invaluable. Apologies in advance for
> the amount of info below...
>
> My app is likely to come under some serious load in the next 6 months,
> but the increase will be broadly predictable, so there is time to throw
> hardware at the problem.
>
> Currently I have a ~1GB DB, with the largest (and most commonly accessed
> and updated) two tables having 150,000 and 50,000 rows.
>
> A typical user interaction with the system involves about 15
> single-table selects, 5 selects with joins or subqueries, 3 inserts, and
> 3 updates. The current hardware probably (based on benchmarking and
> profiling) tops out at about 300 inserts/updates *or* 2500 selects per
> second.
>
> There are multiple indexes on each table that updates & inserts happen
> on. These indexes are necessary to provide adequate select performance.
>
> Current hardware/software:
> Quad 700MHz PIII Xeon/1MB cache
> 3GB RAM
> RAID 10 over 4 18GB/10,000rpm drives
> 128MB battery backed controller cache with write-back enabled

Much more cache needed. Say 512MB per controller?

> Redhat 7.3, kernel 2.4.20
> Postgres 7.2.3 (stock redhat issue)

Upgrade to Pg 7.3.4!

> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50. e.g. 3000

Are you *sure* about that???? 3K updates/inserts per second xlates
to 10,800,000 per hour. That, my friend, is a WHOLE HECK OF A LOT!

> inserts/updates or 25,000 selects per second, over a 25GB database with

Likewise: 90,000,000 selects per hour.

> most used tables of 5,000,000 and 1,000,000 rows.
>
> Notably, the data is very time-sensitive, so the active dataset at any

During the 1 hour surge, will SELECTs at 10 minutes after the
hour depend on INSERTs at 5 minutes after the hour?

If not, maybe you could pump the INSERT/UPDATE records into
flat files, to be processed after the 1-hour surge is complete.
That may reduce the h/w requirements.

> hour is almost certainly going to be more on the order of 5GB than 25GB
> (plus I'll want all the indexes in RAM of course).
>
> Also, and importantly, the load comes but one hour per week, so buying a

Only one hour out of 168????? May I ask what kind of app it is?

> Starfire isn't a real option, as it'd just sit idle the rest of the
> time. I'm particularly interested in keeping the cost down, as I'm a
> shareholder in the company!

What a fun exercises. Ok, lets see:
Postgres 7.3.4
RH AS 2.1
12GB RAM
motherboard with 64 bit 66MHz PCI slots
4 - Xenon 3.0GHz (1MB cache) CPUs
8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
having 512MB cache (for database)
2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
having 512MB cache (for OS, swap, WAL files)
1 - library tape drive plugged into the OS' SCSI controller. I
prefer DLT, but that's my DEC bias.
1 - 1000 volt UPS.

If you know when the flood will be coming, you could perform
SELECT * FROM ... WHERE statements on an indexed field, to
pull the relevant data into Linux's buffers.

Yes, the 8 disks is capacity-overkill, but the 8 high-speed
spindles is what you're looking for.

> So what do I need? Can anyone who has (or has ever had) that kind of
> load in production offer any pointers, anecdotes, etc? Any theoretical
> musings also more than welcome. Comments upon my sanity will be
> referred to my doctor.
>
> If the best price/performance option is a second hand 32-cpu Alpha
> running VMS I'd be happy to go that way ;-)

I'd love to work on a GS320! You may even pick one up for a million
or 2. The license costs for VMS & Rdb would eat you, though.

Rdb *does* have ways, though, using large buffers and hashed indexes,
with the table tuples stored on the same page as the hashed index
keys, to make such accesses *blazingly* fast.

> Many thanks for reading this far.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron(dot)l(dot)johnson(at)cox(dot)net
Jefferson, LA USA

"A C program is like a fast dance on a newly waxed dance floor
by people carrying razors."
Waldi Ravens

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2003-08-27 03:21:23 Beta2 Tag'd and Bundled ...
Previous Message Bill Moran 2003-08-27 02:11:48 Re: Hardware recommendations to scale to silly load

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2003-08-27 03:25:41 Re: [GENERAL] Replication Ideas
Previous Message Bill Moran 2003-08-27 02:11:48 Re: Hardware recommendations to scale to silly load