Re: Initdb-time block size specification

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: David Christensen <david(dot)christensen(at)crunchydata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Initdb-time block size specification
Date: 2023-06-30 22:20:38
Message-ID: 20230630222038.itet32chzwjf6gc6@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-06-30 23:42:30 +0200, Tomas Vondra wrote:
> I wonder what are the conditions/options for disabling FPI. I kinda
> assume it'd apply to new drives with 4k sectors, with properly aligned
> partitions etc. But I haven't seen any particularly clear confirmation
> that's correct.

Yea, it's not trivial. And the downsides are also substantial from a
replication / crash recovery performance POV - even with reading blocks ahead
of WAL replay, it's hard to beat just sequentially reading nearly all the data
you're going to need.

> On 6/30/23 23:11, Andres Freund wrote:
> > If we really wanted to do this - but I don't think we do - I'd argue for
> > working on the buildsystem support to build the postgres binary multiple
> > times, for 4, 8, 16 kB BLCKSZ and having a wrapper postgres binary that just
> > exec's the relevant "real" binary based on the pg_control value. I really
> > don't see us ever wanting to make BLCKSZ runtime configurable within one
> > postgres binary. There's just too much intrinsic overhead associated with
> > that.
>
> How would that work for extensions which may be built for a particular
> BLCKSZ value (like pageinspect)?

I think we'd need to do something similar for extensions, likely loading them
from a path that includes the "subvariant" the server currently is running. Or
alternatively adding a suffix to the filename indicating the
variant. Something like pageinspect.x86-64-v4-4kB.so.

The x86-64-v* stuff is something gcc and clang added a couple years ago, so
that not every project has to define different "baseline" levels. I think it
was done in collaboration with the sytem-v/linux AMD64 ABI specification group
([1]).

Greetings,

Andres

[1] https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
section 3.1.1.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-06-30 22:21:03 Re: Initdb-time block size specification
Previous Message Nikolay Samokhvalov 2023-06-30 22:18:03 Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?