Re: Hardware/OS recommendations for large databases (

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: stange(at)rentec(dot)com, "Greg Stark" <gsstark(at)mit(dot)edu>, "Dave Cramer" <pg(at)fastcrypt(dot)com>, "Joshua Marsh" <icub3d(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-24 09:22:03
Message-ID: BFAAC63B.14618%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mark,

>> Time: 197870.105 ms
>
> So 198 seconds is the uncached read time with count (Just for clarity,
> did you clear the Pg and filesystem caches or unmount / remount the
> filesystem?)

Nope - the longer time is due to the "second write" known issue with
Postgres - it writes the data to the table, but all of the pages are marked
dirty? So, always on the first scan after loading they are written again.
This is clear as you watch vmstat - the pattern on the first seq scan is
half read / half write.

>> Time: 49218.739 ms
>>
>
> and ~50 seconds is the (partially) cached read time with count

Again - the pattern here is pure read and completely non-cached. You see a
very nearly constant I/O rate when watching vmstat for the entire scan.

>> Time: 34528.053 ms

> so ~34 seconds is the (partially) cached read time for fastcount -
> I calculate this to give ~362Mb/s effective IO rate (I'm doing / by
> 1024*1024 not 1000*1000) FWIW.

The dd number uses 1000*1000, so I maintained it for the percentage of max.

> While this is interesting, you probably want to stop Pg, unmount the
> filesystem, and restart Pg to get the uncached time for fastcount too
> (and how does this compare to uncached read with dd using the same block
> size?).

I'll do it again sometime, but I've already deleted the file. I've done the
following in the past to validate this though:

- Reboot machine
- Rerun scan

And we get identical results.

> But at this stage it certainly looks the the heapscan code is pretty
> efficient - great!

Yep.

> Oh - and do you want to try out 32K block size, I'm interested to see
> what level of improvement you get (as my system is hopelessly cpu bound...)!

Yah - done so in the past and not seen any - was waiting for Alan to post
his results.

>> ============================================================================
>> Analysis:
>> ============================================================================
>> Bandwidth Percent of max
>> dd Read 407MB/s 100%
>> Count(1) 263MB/s 64.6%
>> HeapScan 383MB/s 94.1%

Note these are all in consistent 1000x1000 units.

Thanks for the test - neat trick! We'll use it to do some more profiling
some time soon...

- Luke

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2005-11-24 09:24:35 Re: Hardware/OS recommendations for large databases (
Previous Message Mark Kirkwood 2005-11-24 09:11:36 Re: Hardware/OS recommendations for large databases (