Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Date: 2009-09-28 20:48:39
Message-ID: dcc563d10909281348l597e5a18k1c4717daeb8d7b34@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Sep 28, 2009 at 5:53 AM, Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Sun, Sep 27, 2009 at 07:22:47PM -0600, Scott Marlowe wrote:
>> >> dd if=/dev/zero of=test.txt bs=8192 count=1310720 conv=fdatasync
>> >> 10737418240 bytes (11 GB) copied, 169.482 s, 63.4 MB/s
>> >>
>> >> dd if=test.txt of=/dev/null bs=8192
>> >> 10737418240 bytes (11 GB) copied, 86.4457 s, 124 MB/s
>> >
>> > These look slow.
>
>> They are slow, they are not atypical for RAID5; especially the slow
>> writes with SW RAID-5 are typical.
>
> Wow, no wonder it's shunned so much here!  I'd not realized before that
> it incurred such a hit.
>
>> I'd try a simple test on a 2 or 3 disk RAID-0 for testing purposes
>> only to see how much faster a RAID-10 array of n*2 disks could be.
>> The increase in random write performance for RAID-10 will be even more
>> noticeable.
>
> I was thinking that the higher the bandwidth the IO subsystem could push
> the data though the more important a larger block size would be--less
> to and fro between the kernel and userspace.  If the OP reported
> considerably higher CPU usage than expected then he could try rebuilding
> with larger block sizes to see if it helps.
>
> I'm assuming that PG only issues block sized reads?  How does changing
> block size affect index access performance; does it slow it down because
> it has to pull the whole block in?

My experience has been that the stripe size of the RAID array is what
matters. It's about a compromise between something that works well
with sequential scans and something that works well with random access
at the same time. On a purely transactional db, having a stripe size
in the 8k to 64k range seems optimal, depending on the RAID setup /
hardware. For something on a reporting database, getting the stripe
size in the 32k to 512k range is usually best. Most of the time I hit
32 or 64k stripes and it's decent at both.

Random IO is the killer. If you've got at least 10 to 20% random IO,
it's what to tune for usually.

I haven't played much with larger pg blocksize on pgsql much in recent
years. I remember someone doing so and seeing better performance up
to 32k, but that's the largest block size pg supports I believe, and
since that code path isn't as well tested as 8k, I just stick to 8k.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sam Mason 2009-09-28 20:52:36 Re: bulk inserts
Previous Message Reid Thompson 2009-09-28 20:29:51 Re: computed values in plpgsql