Re: Direct I/O

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Direct I/O
Date: 2023-04-07 12:35:15
Message-ID: CA+hUKG+DoL6GZyUpoVhbfT2+N53g_DXZr5fvq1-Wh=aEtaY92w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 25, 2023 at 8:57 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Thanks. I have some comments on
> v3-0002-Add-io_direct-setting-developer-only.patch:
>
> 1. I think we don't need to overwrite the io_direct_string in
> check_io_direct so that show_io_direct can be avoided.

Thanks for looking at this, and sorry for the late response. Yeah, agreed.

> 2. check_io_direct can leak the flags memory - when io_direct is not
> supported or for an invalid list syntax or an invalid option is
> specified.
>
> I have addressed my review comments as a delta patch on top of v3-0002
> and added it here as v1-0001-Review-comments-io_direct-GUC.txt.

Thanks. Your way is nicer. I merged your patch and added you as a co-author.

> Some comments on the tests added:
>
> 1. Is there a way to know if Direct IO for WAL and data has been
> picked up programmatically? IOW, can we know if the OS page cache is
> bypassed? I know an external extension pgfincore which can help here,
> but nothing in the core exists AFAICS.

Right, that extension can tell you how many pages are in the kernel
page cache which is quite interesting for this. I also once hacked up
something primitive to see *which* pages are in kernel cache, so I
could join that against pg_buffercache to measure double buffering,
when I was studying the "smile" shape where pgbench TPS goes down and
then back up again as you increase shared_buffers if the working set
is nearly as big as physical memory (code available in a link from
[1]).

Yeah, I agree it might be nice for human investigators to put
something like that in contrib/pg_buffercache, but I'm not sure you
could rely on it enough for an automated test, though, 'cause it
probably won't work on some file systems and the tests would probably
fail for random transient reasons (for example: some systems won't
kick pages out of kernel cache if they were already there, just
because we decided to open the file with O_DIRECT). (I got curious
about why mincore() wasn't standardised along with mmap() and all that
jazz; it seems the BSD and later Sun people who invented all those
interfaces didn't think that one was quite good enough[2], but every
(?) Unixoid OS copied it anyway, with variations... Apparently the
Windows thing is called VirtualQuery()).

> 2. Can we combine these two append_conf to a single statement?
> +$node->append_conf('io_direct', 'data,wal,wal_init');
> +$node->append_conf('shared_buffers', '64kB'); # tiny to force I/O

OK, sure, done. And also oops, that was completely wrong and not
working the way I had it in that version...

> 3. A nitpick: Can we split these queries multi-line instead of in a single line?
> +$node->safe_psql('postgres', 'begin; create temporary table t2 as
> select 1 as i from generate_series(1, 10000); update t2 set i = i;
> insert into t2count select count(*) from t2; commit;');

OK.

> 4. I don't think we need to stop the node before the test ends, no?
> +$node->stop;
> +
> +done_testing();

Sure, but why not?

Otherwise, I rebased, and made a couple more changes:

I found a line of the manual about wal_sync_method that needed to be removed:

- The <literal>open_</literal>* options also use
<literal>O_DIRECT</literal> if available.

In fact that sentence didn't correctly document the behaviour in
released branches (wal_level=minimal is also required for that, so
probably very few people ever used it). I think we should adjust that
misleading sentence in back-branches, separately from this patch set.

I also updated the commit message to highlight the only expected
user-visible change for this, namely the loss of the above incorrectly
documented obscure special case, which is replaced by the less obscure
new setting io_direct=wal, if someone still wants that behaviour.

Also a few minor comment changes.

[1] https://twitter.com/MengTangmu/status/994770040745615361
[2] http://kos.enix.org/pub/gingell8.pdf

Attachment Content-Type Size
v4-0001-Introduce-PG_IO_ALIGN_SIZE-and-align-all-I-O-buff.patch text/x-patch 24.3 KB
v4-0002-Add-io_direct-setting-developer-only.patch text/x-patch 21.2 KB
v4-0003-XXX-turn-on-direct-I-O-by-default-just-for-CI.patch text/x-patch 756 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-04-07 12:49:36 Re: cataloguing NOT NULL constraints
Previous Message Andrew Dunstan 2023-04-07 12:06:55 Re: [PATCH] Allow Postgres to pick an unused port to listen