Skip site navigation (1) Skip section navigation (2)

Re: [Fwd: Re: 8192 BLCKSZ ?]

From: mlw <markw(at)mohawksoft(dot)com>
To: kogorman(at)pacbell(dot)net, Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Fwd: Re: 8192 BLCKSZ ?]
Date: 2000-12-01 14:25:41
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Kevin O'Gorman wrote:
> mlw wrote:
> >
> > Kevin O'Gorman wrote:
> > >
> > > mlw wrote:
> > > > Many operating systems used a fixed memory block size allocation for
> > > > their disk cache. They do not allocate a new block for every disk
> > > > request, they maintain a pool of fixed sized buffer blocks. So if you
> > > > use fewer bytes than the OS block size you waste the difference between
> > > > your block size and the block size of the OS cache entry.
> > > >
> > > > I'm pretty sure Linux uses a 32K buffer size in its cache, and I'm
> > > > pretty confident that NT does as well from my previous tests.
> > >
> > > I dunno about NT, but here's a quote from "Linux Kernel Internals"
> > > 2nd Ed, page 92-93:
> > >     .. The block size for any given device may be 512, 1024, 2048 or
> > >     4096 bytes....
> > >
> > >     ... the buffer cache manages individual block buffers of
> > >     varying size.  For this, every block is given a 'buffer_head' data
> > >     structure. ...  The definition of the buffer head is in linux/fs.h
> > >
> > >     ... the size of this area exactly matches the block size 'b_size'...
> > >
> > > The quote goes on to describe how the data structures are designed to
> > > be processor-cache-aware.
> > >
> >
> > I double checked the kernel source, and you are right. I stand corrected
> > about the disk caching.
> >
> > My assertion stands, it is a neglagable difference to read 32K vs 8K
> > from a disk, and the probability of data being within a 4 times larger
> > block is 4 times better, even though the probability of having the
> > correct block in memory is 4 times less. So, I don't think it is a
> > numerically significant issue.
> >
> My point is that it's going to depend strongly on what you're doing.
> If you're getting only one item from each block, you pay a cost in cache
> flushing even if the disk I/O time isn't much different.  You're carrying
> 3x unused bytes and displacing other, possibly useful, things from the
> cache.
> So whether it's a good thing or not is something you have to measure, not
> argue about.  Because it will vary depending on your workload.  That's
> where a DBA begins to earn his/her pay.

I would tend to disagree "in general." One can always find more optimal
ways to search data if one knows the nature of the data and the nature
of the search before hand. The nature of the data could be knowledge of
whether it is sorted along the lines of the type of search you want to
do. It could be knowledge of the entirety of the data, and so on.

The cost difference between 32K vs 8K disk reads/writes are so small
these days when compared with overall cost of the disk operation itself,
that you can even measure it, well below 1%. Remember seek times
advertised on disks are an average. 

SQL itself is a compromise between a hand coded search program and a
general purpose solution. As a general purpose search system, one can
not conclude that data is less likely to be in a larger block vs more
likely to be in a smaller block that remains in cache.

There are just as many cases where one could make an argument about one
verses the other based on the nature of data and the nature of the

However, that being said, memory DIMMS are 256M for $100 and time is
priceless. The 8K default has been there as long I can remember having
to think about it, and only recently did I learn it can be changed. I
have been using Postgres since about 1996.

I argue that reading 32K is, for all practical purposes, not measurably
different to read or write to disk than is 8K. The sole point in your
argument is that with a 4x larger block you have a 1/4 chance that the
block will be in memory. 

I argue that with a 4x greater block size, you have 4x greater chance
that data will be in a block, and that this offsets the 1/4 chance of
something being in cache. 

The likelihood of something being in a cache is directly proportional to
the ratio of the size of whole object being cached vs size of the cache
itself, and the algorithms used to calculate what remains in cache.
Typically this is a combination of LRU, frequency, and some predictive

Small databases may, in fact, reside entirely in disk cache because of
the amount of RAM on modern machines. Large databases can not be
entirely cached and some small percentage of them will be in cache.
Depending on the "randomness" of the search criteria, the probability of
the item which you wish to locate being in cache has, as far as I can
see, little to do with the block size.

I am going to see if I can get some time together this weekend and see
if the benchmark programs measure a difference in block sizes, and if
so, compare. I will try to test 8K, 16K, 24K, 32K.


In response to


pgsql-hackers by date

Next:From: Don BaccusDate: 2000-12-01 14:39:57
Subject: Re: beta testing version
Previous:From: Frank JoerdensDate: 2000-12-01 11:39:50
Subject: Re: [HACKERS] Re: PHPBuilder article -- Postgres vsMySQL

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group