Re: [Fwd: Re: 8192 BLCKSZ ?]

From: mlw <markw(at)mohawksoft(dot)com>
To: kogorman(at)pacbell(dot)net, Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Fwd: Re: 8192 BLCKSZ ?]
Date: 2000-11-29 12:25:46
Message-ID: 3A24F5CA.7CE2D1FC@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kevin O'Gorman wrote:
>
> mlw wrote:
> >
> > Tom Samplonius wrote:
> >
> > > On Tue, 28 Nov 2000, mlw wrote:
> > >
> > > > Tom Samplonius wrote:
> > > > >
> > > > > On Mon, 27 Nov 2000, mlw wrote:
> > > > >
> > > > > > This is just a curiosity.
> > > > > >
> > > > > > Why is the default postgres block size 8192? These days, with caching
> > > > > > file systems, high speed DMA disks, hundreds of megabytes of RAM, maybe
> > > > > > even gigabytes. Surely, 8K is inefficient.
> > > > >
> > > > > I think it is a pretty wild assumption to say that 32k is more efficient
> > > > > than 8k. Considering how blocks are used, 32k may be in fact quite a bit
> > > > > slower than 8k blocks.
> > > >
> > > > I'm not so sure I agree. Perhaps I am off base here, but I did a bit of
> > > > OS profiling a while back when I was doing a DICOM server. I
> > > > experimented with block sizes and found that the best throughput on
> > > > Linux and Windows NT was at 32K. The graph I created showed a steady
> > > > increase in performance and a drop just after 32K, then steady from
> > > > there. In Windows NT it was more pronounced than it was in Linux, but
> > > > Linux still exhibited a similar trait.
> > >
> > > You are a bit off base here. The typical access pattern is random IO,
> > > not sequentional. If you use a large block size in Postgres, Postgres
> > > will read and write more data than necessary. Which is faster? 1000 x 8K
> > > IOs? Or 1000 x 32K IOs
> >
> > I can sort of see your point, but the 8K vs 32K is not a linear
> > relationship.
> > The big hit is the disk I/O operation, more so than just the data size.
> > It may
> > be almost as efficient to write 32K as it is to write 8K. While I do not
> > know the
> > exact numbers, and it varies by OS and disk subsystem, I am sure that
> > writing
> > 32K is not even close to 4x more expensive than 8K. Think about seek
> > times,
> > writing anything to the disk is expensive regardless of the amount of
> > data. Most
> > disks today have many heads, and are RL encoded. It may only add 10us
> > (approx.
> > 1-2 sectors of a 64 sector drive spinning 7200 rpm) to a disk operation
> > which
> > takes an order of magnitude longer positioning the heads.
> >
> > The overhead of an additional 24K is minute compared to the cost of a
> > disk
> > operation. So if any measurable benefit can come from having bigger
> > buffers, i.e.
> > having more data available per disk operation, it will probably be
> > faster.
>
> This is only part of the story. It applies best when you're going
> to use sequential scans, for instance, or otherwise use all the info
> in any block that you fetch. However, when your blocks are 8x bigger,
> your number of blocks in the disk cache is 8x fewer. If you're
> accessing random blocks, your hopes of finding the block in the
> cache are affected (probably not 8x, but there is an effect).
>
> So don't just blindly think that bigger blocks are better. It
> ain't necessarily so.
>

First, the difference between 8K and 32K is 4 not 8.

The problem is you are looking at these numbers as if there is a linear
relationship between the 8 and the 32. You are thinking 8 is 1/4 the
size of 32, so it must be 1/4 the amount of work. This is not true at
all.

Many operating systems used a fixed memory block size allocation for
their disk cache. They do not allocate a new block for every disk
request, they maintain a pool of fixed sized buffer blocks. So if you
use fewer bytes than the OS block size you waste the difference between
your block size and the block size of the OS cache entry.

I'm pretty sure Linux uses a 32K buffer size in its cache, and I'm
pretty confident that NT does as well from my previous tests.

So, in effect, an 8K block may waste 3/4 of the memory in the disk
cache.


http://www.mohawksoft.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message xuyifeng 2000-11-29 12:27:37 Re: beta testing version
Previous Message Magnus Naeslund(f) 2000-11-29 12:08:00 Re: beta testing version