Initdb-time block size specification

From: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Initdb-time block size specification
Date: 2023-06-30 17:35:09
Message-ID: CAOxo6XKx7DyDgBkWwPfnGSXQYNLpNrSWtYnK6-1u+QHUwRa1Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As discussed somewhat at PgCon, enclosed is v1 of a patch to provide
variable block sizes; basically instead of BLCKSZ being a compile-time
constant, a single set of binaries can support all of the block sizes
Pg can support, using the value stored in pg_control as the basis.
(Possible future plans would be to make this something even more
dynamic, such as configured per tablespace, but this is out of scope;
this just sets up the infrastructure for this.)

Whereas we had traditionally used BLCKSZ to indicate the compile-time selected
block size, this commit adjusted things so the cluster block size can be
selected at initdb time.

In order to code for this, we introduce a few new defines:

- CLUSTER_BLOCK_SIZE is the blocksize for this cluster itself. This is not
valid until BlockSizeInit() has been called in the given backend, which we do as
early as possible by parsing the ControlFile and using the blcksz field.

- MIN_BLOCK_SIZE and MAX_BLOCK_SIZE are the limits for the selectable block
size. It is required that CLUSTER_BLOCK_SIZE is a power of 2 between these two
constants.

- DEFAULT_BLOCK_SIZE is the moral equivalent of BLCKSZ; it is the built-in
default value. This is used in a few places that just needed a buffer of an
arbitrary size, but the dynamic value CLUSTER_BLOCK_SIZE should almost always be
used instead.

- CLUSTER_RELSEG_SIZE is used instead of RELSEG_SIZE, since internally we are
storing the segment size in terms of number of blocks. RELSEG_SIZE is still
kept, but is used in terms of the number of blocks of DEFAULT_BLOCK_SIZE;
CLUSTER_RELSEG_SIZE scales appropriately (and is the only thing used internally)
to keep the same target total segment size regardless of block size.

This patch uses a precalculated table to store the block size itself, as well as
additional derived values that have traditionally been compile-time
constants (example: MaxHeapTuplesPerPage). The traditional macro names are kept
so code that doesn't care about it should not need to change, however the
definition of these has changed (see the CalcXXX() routines in blocksize.h for
details).

A new function, BlockSizeInit() populates the appropriate values based on the
target block size. This should be called as early as possible in any code that
utilizes block sizes. This patch adds this in the appropriate place on the
handful of src/bin/ programs that used BLCKSZ, so this caveat mainly impacts new
code.

Code which had previously used BLCKZ should likely be able to get away with
changing these instances to CLUSTER_BLOCK_SIZE, unless you're using a structure
allocated on the stack. In these cases, the compiler will complain about
dynamic structure. The solution is to utilize an expression with MAX_BLOCK_SIZE
instead of BLCKSZ, ensuring enough stack space is allocated for the maximum
size. This also does require using CLUSTER_BLOCK_SIZE or an expression based on
it when actually using this structure, so in practice more stack space may be
allocated then used in principal; as long as there is plenty of stack this
should have no specific impacts on code.

Initial (basic) performance testing shows only minor changes with the pgbench -S
benchmark, though this is obviously an area that will need considerable
testing/verification across multiple workloads.

Thanks,

David

Attachment Content-Type Size
v1-0001-Introduce-initdb-selectable-block-sizes.patch application/octet-stream 321.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-06-30 17:41:06 Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Previous Message Drouvot, Bertrand 2023-06-30 17:32:50 Re: ProcessStartupPacket(): database_name and user_name truncation