Re: Direct I/O

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Direct I/O
Date: 2023-04-08 04:47:36
Message-ID: CA+hUKGJefS_-AHdNF9dFALhKgYyo9TGLVKcSMZuq6fFFKTqPXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did some testing with non-default block sizes, and found a few minor
things that needed adjustment. The short version is that I blocked
some configurations that won't work or would break an assertion.
After a bit more copy-editing on docs and comments and a round of
automated indenting, I have now pushed this. I will now watch the
build farm. I tested on quite a few OSes that I have access to, but
this is obviously a very OS-sensitive kind of a thing.

The adjustments were:

1. If you set your BLCKSZ or XLOG_BLCKSZ smaller than
PG_IO_ALIGN_SIZE, you shouldn't be allowed to turn on direct I/O for
the relevant operations, because such undersized direct I/Os will fail
on common systems.

FATAL: invalid value for parameter "io_direct": "wal"
DETAIL: io_direct is not supported for WAL because XLOG_BLCKSZ is too small

FATAL: invalid value for parameter "io_direct": "data"
DETAIL: io_direct is not supported for data because BLCKSZ is too small

In fact some systems would be OK with it if the true requirement is
512 not 4096, but (1) tiny blocks are a niche build option that
doesn't even pass regression tests and (2) it's hard and totally
unportable to find out the true requirement at runtime, and (3) the
conservative choice of 4096 has additional benefits by matching memory
pages. So I think a conservative compile-time number is a good
starting position.

2. Previously I had changed the WAL buffer alignment to be the larger
of PG_IO_ALIGN_SIZE and XLOG_BLCKSZ, but in light of the above
thinking, I reverted that part (no point in aligning the address of
the buffer when the size is too small for direct I/O, but now that
combination is blocked off at GUC level so we don't need any change
here).

3. I updated the md.c alignment assertions to allow for tiny blocks.
The point of these assertions is to fail if any new code does I/O from
badly aligned buffers even with io_direct turned off (ie how most
people hack), 'cause that will fail with io_direct turned on. The
change is that I don't make the assertion if you're using BLCKSZ <
PG_IO_ALIGN_SIZE. Such buffers wouldn't work if used for direct I/O
but that's OK, the GUC won't allow it.

4. I made the language to explain where PG_IO_ALIGN_SIZE really comes
from a little vaguer because it's complex.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-04-08 04:59:20 Re: Direct I/O
Previous Message Amit Kapila 2023-04-08 04:22:47 Re: Minimal logical decoding on standbys