Re: Direct I/O

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Direct I/O
Date: 2022-12-14 04:48:21
Message-ID: CA+hUKGJna_uWyE6dnaZ4VbAciRaOe=s4fp1_VFue5haN3CvcLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 2, 2022 at 11:54 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-11-02 09:44:30 +1300, Thomas Munro wrote:
> > On Wed, Nov 2, 2022 at 2:33 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > On Tue, Nov 01, 2022 at 08:36:18PM +1300, Thomas Munro wrote:
> > > > io_data_direct = whether to use O_DIRECT for main data files
> > > > io_wal_direct = ... for WAL
> > > > io_wal_init_direct = ... for WAL-file initialisation
> > >
> > > You added 3 booleans, but I wonder if it's better to add a string GUC
> > > which is parsed for comma separated strings.

Done as io_direct=data,wal,wal_init. Thanks Justin, this is better.
I resisted the urge to invent a meaning for 'on' and 'off', mainly
because it's not clear what values 'on' should enable and it'd be
strange to have off without on, so for now an empty string means off.
I suppose the meaning of this string could evolve over time: the names
of forks, etc.

> Perhaps we could use the guc assignment hook to transform the input value into
> a bitmask?

Makes sense. The only tricky question was where to store the GUC. I
went for fd.c for now, but it doesn't seem quite right...

> > > DIO is slower, but not so much that it can't run under CI. I suggest to
> > > add an 099 commit to enable the feature during development.
> >
> > Good idea, will do.

Done. The tests take 2-3x as long depending on the OS.

> Might be worth to additionally have a short tap test that does some basic
> stuff with DIO and leave that enabled? I think it'd be good to have
> check-world exercise DIO on dev machines, to reduce the likelihood of finding
> problems only in CI, which is somewhat painful.

Done.

> > > Note that this fails under linux with fsanitize=align:
> > > ../src/backend/storage/file/buffile.c:117:17: runtime error: member access within misaligned address 0x561a4a8e40f8 for type 'struct BufFile', which requires 4096 byte alignment
> >
> > Oh, so BufFile is palloc'd and contains one of these. BufFile is not
> > even using direct I/O, but by these rules it would need to be
> > palloc_io_align'd. I will think about what to do about that...
>
> It might be worth having two different versions of the struct, so we don't
> impose unnecessarily high alignment everywhere?

Done. I now have PGAlignedBlock (unchanged) and PGIOAlignedBlock.
You have to use the latter for SMgr, because I added alignment
assertions there. We might as well use it for any other I/O such as
frontend code too for a chance of a small performance boost as you
showed. For now I have not use PGIOAlignedBlock for BufFile, even
though it would be a great candidate for a potential speedup, only
because I am afraid of adding padding to every BufFile in scenarios
where we allocate many (could be avoided, a subject for separate
research).

V2 comprises:

0001 -- David's palloc_aligned() patch
https://commitfest.postgresql.org/41/3999/
0002 -- I/O-align almost all buffers used for I/O
0003 -- Add the GUCs
0004 -- Throwaway hack to make cfbot turn the GUCs on

Attachment Content-Type Size
v2-0001-Add-allocator-support-for-larger-allocation-align.patch text/x-patch 14.8 KB
v2-0002-Introduce-PG_IO_ALIGN_SIZE-and-align-all-I-O-buff.patch text/x-patch 24.8 KB
v2-0003-Add-io_direct-setting-developer-only.patch text/x-patch 20.4 KB
v2-0004-XXX-turn-on-direct-I-O-by-default-just-for-CI.patch text/x-patch 749 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-12-14 04:54:48 Re: Schema variables - new implementation for Postgres 15
Previous Message houzj.fnst@fujitsu.com 2022-12-14 04:19:58 RE: Perform streaming logical transactions by background workers and parallel apply