Re: Hardware/OS recommendations for large databases (

From: Alan Stange <stange(at)rentec(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Luke Lonergan <llonergan(at)greenplum(dot)com>, Dave Cramer <pg(at)fastcrypt(dot)com>, Joshua Marsh <icub3d(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-18 19:39:30
Message-ID: 437E2DF2.50906@rentec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Stark wrote:
> Alan Stange <stange(at)rentec(dot)com> writes:
>
>
>> Luke Lonergan wrote:
>>
>>> Alan,
>>>
>>> On 11/18/05 9:31 AM, "Alan Stange" <stange(at)rentec(dot)com> wrote:
>>>
>>>
>>>
>>>> Here's the output from one iteration of iostat -k 60 while the box is
>>>> doing a select count(1) on a 238GB table.
>>>>
>>>> avg-cpu: %user %nice %sys %iowait %idle
>>>> 0.99 0.00 17.97 32.40 48.64
>>>>
>>>> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
>>>> sdd 345.95 130732.53 0.00 7843952 0
>>>>
>>>> We're reading 130MB/s for a full minute. About 20% of a single cpu was
>>>> being used. The remainder being idle.
>>>>
>>>>
>>> Cool - thanks for the results. Is that % of one CPU, or of 2? Was the
>>> system otherwise idle?
>>>
>>>
>> Actually, this was dual cpu
>>
>
> I hate to agree with him but that looks like a dual machine with one CPU
> pegged. Yes most of the time is being spent in the kernel, but you're still
> basically cpu limited.
>
> That said, 130MB/s is nothing to sneeze at, that's maxing out two high end
> drives and quite respectable for a 3-disk stripe set, even reasonable for a
> 4-disk stripe set. If you're using 5 or more disks in RAID-0 or RAID 1+0 and
> only getting 130MB/s then it does seem likely the cpu is actually holding you
> back here.
>
> Still it doesn't show Postgres being nearly so CPU wasteful as the original
> poster claimed.
>
Yes and no. The one cpu is clearly idle. The second cpu is 40% busy
and 60% idle (aka iowait in the above numbers).
Of that 40%, other things were happening as well during the 1 minute
snapshot. During some iostat outputs that I didn't post the cpu time
was ~ 20%.

So, you can take your pick. The single cpu usage is somewhere between
20% and 40%. As I can't remove other users of the system, it's the best
measurement that I can make right now.

Either way, it's not close to being cpu bound. This is with Opteron
248, 2.2Ghz cpus.

Note that the storage system has been a bit disappointing: it's an IBM
Fast T600 with a 200MB/s fiber attachment. It could be better, but
it's not been the bottleneck in our work, so we haven't put any energy
into it.

>> It's all in the kernel either way; using a different scheduler or file
>> system would change that result. Even better would be using direct IO to not
>> flush everything else from memory and avoid some memory copies from kernel
>> to user space. Note that almost none of the time is user time. Changing
>> postgresql won't change the cpu useage.
>>
> Well changing to direct i/o would still be changing Postgres so that's
> unclear. And there are plenty of more mundane ways that Postgres is
> responsible for how efficiently or not the kernel is used. Just using fewer
> syscalls to do the same amount of reading would reduce cpu consumption.
Absolutely. This is why we're using a 32KB block size and also switched
to using O_SYNC for the WAL syncing method. That's many MB/s that
don't need to be cached in the kernel (thus evicting other data), and we
avoid all the fysnc/fdatasync syscalls.

The purpose of direct IO isn't to make the vacuum or analyze faster, but
to lessen their impact on queries with someone waiting for the
results. That's our biggest hit: running a sequential scan on 240GB
of data and flushing everything else out of memory.

Now that I'm think about this a bit, a big chunk of time is probably
being lost in TLB misses and other virtual memory events that would be
avoided if a larger page size was being used.

-- Alan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alan Stange 2005-11-18 19:39:50 Re: Hardware/OS recommendations for large databases (
Previous Message Luke Lonergan 2005-11-18 19:24:48 Re: Hardware/OS recommendations for large databases (