From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Christensen <david(dot)christensen(at)crunchydata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: Initdb-time block size specification |
Date: | 2023-06-30 22:59:09 |
Message-ID: | 20230630225909.ecthnlfvlnk3ij2k@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2023-06-30 18:37:39 -0400, Bruce Momjian wrote:
> On Sat, Jul 1, 2023 at 12:21:03AM +0200, Tomas Vondra wrote:
> > On 6/30/23 23:53, Bruce Momjian wrote:
> > > For a 4kB write, to say it is not partially written would be to require
> > > the operating system to guarantee that the 4kB write is not split into
> > > smaller writes which might each be atomic because smaller atomic writes
> > > would not help us.
> >
> > Right, that's the dance we do to protect against torn pages. But Andres
> > suggested that if you have modern storage and configure it correctly,
> > writing with 4kB pages would be atomic. So we wouldn't need to do this
> > FPI stuff, eliminating pretty significant source of write amplification.
>
> I agree the hardware is atomic for 4k writes, but do we know the OS
> always issues 4k writes?
When using a sector size of 4K you *can't* make smaller writes via normal
paths. The addressing unit is in sectors. The details obviously differ between
storage protocol, but you pretty much always just specify a start sector and a
number of sectors to be operated on.
Obviously the kernel could read 4k, modify 512 bytes in-memory, and then write
4k back, but that shouldn't be a danger here. There might also be debug
interfaces to allow reading/writing in different increments, but that'd not be
something happening during normal operation.
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2023-06-30 23:01:29 | Re: Initdb-time block size specification |
Previous Message | Bruce Momjian | 2023-06-30 22:58:20 | Re: Initdb-time block size specification |