Re: Initdb-time block size specification

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Initdb-time block size specification
Date: 2023-06-30 21:11:53
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2023-06-30 15:05:54 -0500, David Christensen wrote:
> > I am fairly certain this is going to be causing substantial performance
> > regressions. I think we should reject this even if we don't immediately find
> > them, because it's almost guaranteed to cause some.
> What would be considered substantial? Some overhead would be expected,
> but I think having an actual patch to evaluate lets us see what
> potential there is.

Anything beyond 1-2%, although even that imo is a hard sell.

> > Besides this, I've not really heard any convincing justification for needing
> > this in the first place.
> Doing this would open up experiments in larger block sizes, so we
> would be able to have larger indexable tuples, say, or be able to
> store data types that are larger than currently supported for tuple
> row limits without dropping to toast (native vector data types come to
> mind as a candidate here).

You can do experiments today with the compile time option. Which does not
require regressing performance for everyone.

> We've had 8k blocks for a long time while hardware has improved over 20+
> years, and it would be interesting to see how tuning things would open up
> additional avenues for performance without requiring packagers to make a
> single choice on this regardless of use-case.

I suspect you're going to see more benefits from going to a *lower* setting
than a higher one. Some practical issues aside, plenty of storage hardware
these days would allow to get rid of FPIs if you go to 4k blocks (although it
often requires explicit sysadmin action to reformat the drive into that mode
etc). But obviously that's problematic from the "postgres limits" POV.

If we really wanted to do this - but I don't think we do - I'd argue for
working on the buildsystem support to build the postgres binary multiple
times, for 4, 8, 16 kB BLCKSZ and having a wrapper postgres binary that just
exec's the relevant "real" binary based on the pg_control value. I really
don't see us ever wanting to make BLCKSZ runtime configurable within one
postgres binary. There's just too much intrinsic overhead associated with


Andres Freund

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-06-30 21:12:31 Re: Initdb-time block size specification
Previous Message Nathan Bossart 2023-06-30 20:42:11 Re: Should we remove db_user_namespace?