Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

From: Gerhard Wiesinger <lists(at)wiesinger(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Date: 2009-10-03 07:11:12
Message-ID: alpine.LFD.2.00.0910030854090.13740@bbs.intern
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2 Oct 2009, Greg Smith wrote:

> On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:
>
>> Larger blocksizes also reduce IOPS (I/Os per second) which might be a
>> critial threshold on storage systems (e.g. Fibre Channel systems).
>
> True to some extent, but don't forget that IOPS is always relative to a block
> size in the first place. If you're getting 200 IOPS with 8K blocks,
> increasing your block size to 128K will not result in your getting 200 IOPS
> at that larger size; the IOPS number at the larger block size is going to
> drop too. And you'll pay the penalty for that IOPS number dropping every
> time you're accessing something that would have only been an 8K bit of I/O
> before.
>

Yes, there will be some (very small) drop in IOPS, when blocksize is
higher but today disks have a lot of throughput when IOPS*128k are
compared to e.g. 100MB/s. I've done some Excel calculations which support
this.

> The trade-off is very application dependent. The position you're advocating,
> preferring larger blocks, only makes sense if your workload consists mainly
> of larger scans. Someone who is pulling scattered records from throughout a
> larger table will suffer with that same change, because they'll be reading a
> minimum of 128K even if all they really needed with a few bytes. That
> penalty ripples all the way from the disk I/O upwards through the buffer
> cache.
>

I wouldn't read 128k blocks all the time. I would do the following:
When e.g. B0, B127, B256 should be read I would read in 8k random block
I/O.

When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2
requests with the largest possible blocksize:
1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k

In this case when B5 and B7 are only one block away we could also discuss
whether we should read B1-B10=10*8k=80k in one read request and don't use
B6.

That would reduce the IOPS of a factor of 4-5 in that scenario and
therefore throughput would go up.

> It's easy to generate a synthetic benchmark workload that models some
> real-world applications and see performance plunge with a larger block size.
> There certainly are others where a larger block would work better. Testing
> either way is complicated by the way RAID devices usually have their own
> stripe sizes to consider on top of the database block size.
>

Yes, there are block device read ahead buffers and also RAID stripe
caches. But both don't seem to work well with the tested HEAP BITMAP SCAN
scenario and also in practical PostgreSQL performance measurement
scenarios.

But the modelled pgiosim isn't a synthetic benchmark it is the same as a
real work HEAP BITMAP SCAN scenario in PostgreSQL where some blocks are
read directly consecutive at least logically in the filesystem (and with
some propability also physically on disk) but currently only with each 8k
block read even when 2 or more blocks could be read with one request.

BTW: I would also limit the blocksize to some upper limit on such requests
(e.g. 1MB).

Ciao,
Gerhard

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gerhard Wiesinger 2009-10-03 07:40:59 Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Previous Message 纪晓曦 2009-10-03 06:20:59 Re: Where can I get the number of plans that considered by Planner?