Re: Horrible CREATE DATABASE Performance in High Sierra

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Brent Dearth <brent(dot)dearth(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Horrible CREATE DATABASE Performance in High Sierra
Date: 2017-10-02 22:33:17
Message-ID: 3518.1506983597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> To demonstrate what I'm observing here, on linux with a fairly fast ssd:
> ...

I tried to replicate this test as closely as I could on the Mac hardware
I have laying about. I only bothered with the synchronous_commit=off
case, though, since you say that shows the worst effects. I used the
same parameters you did and the same pgbench settings. I attach the
pgbench output for six cases, flush_after disabled or enabled on three
different machines:

(A) 2016 MacBook Pro, 2.7GHz i7 + SSD, Sierra, HFS+ file system
(B) 2013 MacBook Pro, 2.3GHz i7 + SSD, High Sierra, APFS file system
(C) 2012 Mac Mini, 2.3GHz i7 + 5400-RPM SATA, High Sierra, HFS+ file system

There is some benefit on the SSD machines, but it's in the range of a
few percent --- clearly, these kernels are not as subject to the basic
I/O-scheduling problem as Linux is. The spinning-rust machine shows a
nice gain in overall TPS with flush enabled, but it's actually a bit
worse off in terms of the worst-case slowdown --- note that only that
case shows things coming to a complete halt. It'd be interesting to
check the behavior of a pre-High-Sierra kernel with spinning rust,
but I don't have any remotely modern machine answering that description.

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

> What I'm basically wondering is whether we're screwing somebody over
> that made the effort to manually configure this on OSX. It's fairly
> obvious we need to find a way to disable the msync() by default.

I suspect that anybody who cares about DB performance on macOS will
be running it on SSD-based hardware these days. The benefit seen on
the Mac Mini would have been worth the trouble of a custom configuration
a few years ago, but I'm dubious that it matters in the real world
anymore.

If we could arrange to not use pg_flush_after in copydir.c on macOS,
I'd be okay with leaving it alone for the configurable flush_after
calls. But I can't think of a way to do that that wouldn't be a
complete kluge. I don't much want to do

+#ifndef __darwin__
pg_flush_data(dstfd, offset, nbytes);
+#endif

but I don't see any better alternative ...

regards, tom lane

Attachment Content-Type Size
maca-noflush.txt text/plain 5.8 KB
maca-flush.txt text/plain 5.8 KB
macb-noflush.txt text/plain 5.8 KB
macb-flush.txt text/plain 5.8 KB
macc-noflush.txt text/plain 5.9 KB
macc-flush.txt text/plain 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-10-02 22:34:42 Re: list of credits for release notes
Previous Message Michael Paquier 2017-10-02 22:16:51 Re: Commitfest 201709 is now closed