Re: O_DIRECT on macOS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O_DIRECT on macOS
Date: 2021-07-20 00:26:27
Message-ID: 337210.1626740787@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
>> While I was here again, I couldn't resist trying to extend this to
>> Solaris, since it looked so easy. I don't have access, but I tested
>> on Illumos by undefining O_DIRECT. Thoughts?

> I can try that on the gcc farm in a bit.

Hmm, it compiles cleanly, but something seems drastically wrong,
because performance is just awful. On the other hand, I don't
know what sort of storage is underlying this instance, so maybe
that's to be expected? If I set fsync = off, the speed seems
comparable to what wrasse reports, but with fsync on it's like

test tablespace ... ok 87990 ms
parallel group (20 tests, in groups of 1): boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
boolean ... ok 3229 ms
char ... ok 2758 ms
name ... ok 2229 ms
varchar ... ok 7373 ms
text ... ok 722 ms
int2 ... ok 342 ms
int4 ... ok 1303 ms
int8 ... ok 1095 ms
oid ... ok 1086 ms
float4 ... ok 6360 ms
float8 ... ok 5224 ms
bit ... ok 6254 ms
numeric ... ok 44304 ms
txid ... ok 377 ms
uuid ... ok 3946 ms
enum ... ok 33189 ms
money ... ok 622 ms
rangetypes ... ok 17301 ms
pg_lsn ... ok 798 ms
regproc ... ok 145 ms

(I stopped running it at that point...)

Also, the results of pg_test_fsync seem wrong; it refuses to run
tests for the cases we're interested in:

$ pg_test_fsync
5 seconds per test
DIRECTIO_ON supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync n/a*
fdatasync 8.324 ops/sec 120139 usecs/op
fsync 0.906 ops/sec 1103936 usecs/op
fsync_writethrough n/a
open_sync n/a*
* This file system and its mount options do not support direct
I/O, e.g. ext4 in journaled mode.

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync n/a*
fdatasync 7.329 ops/sec 136449 usecs/op
fsync 0.788 ops/sec 1269258 usecs/op
fsync_writethrough n/a
open_sync n/a*
* This file system and its mount options do not support direct
I/O, e.g. ext4 in journaled mode.

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write n/a*
2 * 8kB open_sync writes n/a*
4 * 4kB open_sync writes n/a*
8 * 2kB open_sync writes n/a*
16 * 1kB open_sync writes n/a*

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 16.388 ops/sec 61020 usecs/op
write, close, fsync 9.084 ops/sec 110082 usecs/op

Non-sync'ed 8kB writes:
write 39855.686 ops/sec 25 usecs/op

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-07-20 00:30:14 RE: row filtering for logical replication
Previous Message Arne Roland 2021-07-20 00:22:13 Re: Rename of triggers for partitioned tables