Sorry, no benchmark results in this post, but I do have a few notes to
pass along for folks:
1) FreeBSD -devel port now has configurable block sizes
2) 65K blocks fail, I think erroneously
3) The size of the postmaster proc and friends explodes to 45MB
4) effective_cache_size is a bad name for a GUC
5) Larger databases with lots of seqscans should use 32K pages, IMHO
1) For the FreeBSD folks in the crowd, it is possible to test non 8K
block sizes with the databases/postgresql-devel port.
WITH_PGBLOCKSIZE It's possible to change the blocksize to "16K"
or "32K" (defaults to 8K and different
block sizes require a dump, initdb, reload!)
# cd /usr/ports/databases/postgresql-devel
# make WITH_PGBLOCKSIZE=32K
Should be easy enough for folks to do. Values of 16K and 32K will
change the block size values. For those with the paranoid hat on,
I've also included the following message as a warning:
* DEFAULT BLOCK SIZE CONFIGURABLE *
Starting with 7.4, PostgreSQL's block size for the -devel port can be
changed from the default 8K blocks to either 16K or 32K blocks by
setting WITH_PGBLOCKSIZE to either "16K" or "32K". Changing block
sizes has uncertain performance implications and can result in faster
operation, in other cases slower. Benchmarking and testing your
installation is *highly recommended* before using these values in any
form of production! When changing block sizes, it is necessary to
dump, initdb, and reload data when changing block sizes. Be careful
when switching from databases/postgresql-devel and
databases/postgresql7 or from databases/postgresql-devel -> src
tarball builds that have different block sizes!
2) While testing the above, I noted that 65K blocks fail and 32K is as
big as PostgreSQL can handle, for one reason or another. When you try
to initdb (postmaster dies in RelationGetBufferForTuple() in
src/backend/access/heap/hio.c, on line 274):
### BEGIN ###
Running in noclean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "sean".
This user must also own the server process.
The database cluster will be initialized with locale C.
creating directory /usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data... ok
creating directory /usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/base... ok
creating directory /usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/global... ok
creating directory /usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/pg_xlog... ok
creating directory /usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/pg_clog... ok
selecting default shared_buffers... 1000
selecting default max_connections... 100
creating configuration files... ok
creating template1 database in /usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/base/1... PANIC: tuple is too big: size 268
Abort trap (core dumped)
### END ###
I'm not sure if this is a case of over-testing or not, however.
Someone with more PG heap foo than me will have to comment on this,
but, it seems as though the test at the bottom of
if (len > PageGetFreeSpace(pageHeader))
/* We should not get here given the test at the top */
elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
should be changed to:
if (len > MaxTupleSize)
based off of the comments. *shrug* I've spent a few min looking at
the code, and I'm not sure where the problem is. Jan or Tom?
3) Using 32K blocks, the postmaster takes a whopping 45MB of RAM! From
539 pgsql 96 0 44312K 12K select 0:00 0.00% 0.00% postgre
548 pgsql 96 0 44340K 12K select 0:00 0.00% 0.00% postgre
547 pgsql 96 0 45284K 12K select 0:00 0.00% 0.00% postgre
Nifty, huh? I haven't been able to figure out where the bloat is
coming from other than to mention that in production, with 8K blocks,
postmaster processes only take up 8MB in RAM. Anyone know where the
extra is coming from? I can only assume it's from some kind of page
cache, but I don't see where that's being set. Even then, I'd expect
only a 4X increase in proportion from going from 8K to 32K, not a 5x
4) effective_cache_size needs to be tuned accordingly because it is
based off of pages, not bytes, which I think isn't portable across
different installations with differing block sizes.
effective_cache_size should be effective_cache_pages and
effective_cache_size should be done in bytes and dynamically
calculate the effective_cache_pages that way the same config
applies across multiple, different block-size backends.
5) On the plus side, the 32K blocks have completely saturated my HDD
IO on the machines that I have tested, which is nice even though
the backend is larger. I'd think for large data warehouse
operations, 32K pages would be the way to go, esp when it comes
time to run reports. With 8K pages, I couldn't max out the IO, but
with 32K, it's very possible and much easier to do. Beyond that, I
don't have much else I want to comment on at the moment.
pgsql-hackers by date
|Next:||From: Hannu Krosing||Date: 2003-10-02 19:09:12|
|Subject: Re: Index/Function organized table layout|
|Previous:||From: James Rogers||Date: 2003-10-02 18:24:33|
|Subject: Re: Index/Function organized table layout|