Skip site navigation (1) Skip section navigation (2)

Re: Hardware/OS recommendations for large databases (

From: Alan Stange <stange(at)rentec(dot)com>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: Dave Cramer <pg(at)fastcrypt(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Joshua Marsh <icub3d(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-18 18:30:06
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Luke Lonergan wrote:
> Alan,
> On 11/18/05 9:31 AM, "Alan Stange" <stange(at)rentec(dot)com> wrote:
>> Here's the output from one iteration of iostat -k 60 while the box is
>> doing a select count(1) on a 238GB table.
>> avg-cpu:  %user   %nice    %sys %iowait   %idle
>>            0.99    0.00   17.97   32.40   48.64
>> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
>> sdd             345.95    130732.53         0.00    7843952          0
>> We're reading 130MB/s for a full minute.  About 20% of a single cpu was
>> being used.   The remainder being idle.
> Cool - thanks for the results.  Is that % of one CPU, or of 2?  Was the
> system otherwise idle?
Actually, this was dual cpu and there was other activity during the full 
minute, but it was on other file devices, which I didn't include in the 
above output.   Given that, and given what I see on the box now I'd 
raise the 20% to 30% just to be more conservative.  It's all in the 
kernel either way; using a different scheduler or file system would 
change that result.  Even better would be using direct IO to not flush 
everything else from memory and avoid some memory copies from kernel to 
user space.  Note that almost none of the time is user time.  Changing 
postgresql won't change the cpu useage.

One IMHO obvious improvement would be to have vacuum and analyze only do 
direct IO.  Now they appear to be very effective memory flushing tools.  
Table scans on tables larger than say 4x memory should probably also use 
direct IO for reads.

>> We've done nothing fancy and achieved results you claim shouldn't be
>> possible.  This is a system that was re-installed yesterday, no tuning
>> was done to the file systems, kernel or storage array.
> Are you happy with 130MB/s?  How much did you pay for that?  Is it more than
> $2,000, or double my 2003 PC?
I don't know what the system cost.   It was part of block of dual 
opterons from Sun that we got some time ago.   I think the 130MB/s is 
slow given the hardware, but it's acceptable.  I'm not too price 
sensitive; I care much more about reliability, uptime, etc.  

>> What am I doing wrong?
>> 9 years ago I co-designed a petabyte data store with a goal of 1GB/s IO
>> (for a DOE lab).   And now I don't know what I'm doing,
> Cool.  Would that be Sandia?
> We routinely sustain 2,000 MB/s from disk on 16x 2003 era machines on
> complex queries.
Disk?!  4 StorageTek tape silos.  That would be .002 TB/s.  One has to 
change how you think when you have that much data. And hope you don't 
have a fire, because there's no backup.   That work was while I was at 
BNL.   I believe they are now at 4PB of tape and 150TB of disk.

-- Alan

In response to


pgsql-performance by date

Next:From: Luke LonerganDate: 2005-11-18 18:52:35
Subject: Re: Hardware/OS recommendations for large databases (
Previous:From: Luke LonerganDate: 2005-11-18 17:54:07
Subject: Re: Hardware/OS recommendations for large databases (

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group