Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Date: 2009-09-27 21:01:27
Message-ID: 20090927210127.GS5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Sep 27, 2009 at 09:04:31PM +0200, Gerhard Wiesinger wrote:
> I'm talking about 2 cases
> 1.) Sequential scans
> 2.) Bitmap index scans
> which both hopefully end physically in blocks which are after each other
> and were larger block sizes can benefit.

Unfortunately it's all a bit more complicated than you hope :(
Sequential scans *may* benefit from larger block sizes, but not much.
Your testing below doesn't seem to test this at all though.

Bitmap index scan will still be accessing blocks in a somewhat random
order (depending on how much correlation there is between the index and
physical rows, and what the selectivity is like). The result of any
index scan (bitmap or otherwise) must come back in the correct order
(PG is designed around this) and the the best idea to speed this up has
been Greg's read ahead patch. This pushes more information down into
the kernel so it can start reading the blocks back before PG actually
gets to them. They are still going to be somewhat out of order so, in
the general case, you're going to be limited by the seek speed of your
disks.

> Detailed benchmarks are below, the original one from PostgreSQL have
> already been posted.

Which was saying what? you were getting 32MB/s and 53MB/s from what?

As a quick test, maybe:

create table benchmark ( i integer, j text, k text );
begin; truncate benchmark; insert into benchmark select generate_series(1,1024*1024*10), '0123456789abcdef','0123456789abcdef'; commit;

The first run of:

select count(*) from benchmark;

Will cause the "hint" bits to get set and will cause a lot of writing to
happen. Subsequent runs will be testing read performance. My simple
SATA disk at home gets ~90MB/s when tested hdparm, which I'm taking as
the upper performance limit. When I perform the above query, I see the
disk pulling data back at 89.60MB/s (stddev of 2.27) which is actually
above what I was expecting (there's a filesystem in the way). CPU usage
is around 10%. Tested by turning on "\timing" mode in psql, dropping
caches and running:

SELECT 715833344 / 7597.216 / 1024;

Where 715833344 is the size of the file backing the benchmark table
above and 7597.216 is the time taken in ms.

> http://pgfoundry.org/projects/pgiosim/

This seems to just be testing seek performance, not sequential
performance.

> dd if=/dev/zero of=test.txt bs=8192 count=1310720 conv=fdatasync
> 10737418240 bytes (11 GB) copied, 169.482 s, 63.4 MB/s
>
> dd if=test.txt of=/dev/null bs=8192
> 10737418240 bytes (11 GB) copied, 86.4457 s, 124 MB/s

These look slow. RAID5 isn't going to be amazing, but it should be
better than this. I'd spend some more time optimizing your system
config before worrying about PG. If I can read at 90MB/s from a single
stock SATA drive you should be almost hitting 200MB/s with this, or
300MB/s in a RAID1 across three drives.

--
Sam http://samason.me.uk/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oliver Kohll - Mailing Lists 2009-09-27 21:19:11 Re: generic modelling of data models; enforcing constraints dynamically...
Previous Message InterRob 2009-09-27 20:10:11 Re: generic modelling of data models; enforcing constraints dynamically...