file cloning in pg_upgrade and CREATE DATABASE

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: file cloning in pg_upgrade and CREATE DATABASE
Date: 2018-02-21 03:00:04
Message-ID: bc9ca382-b98d-0446-f699-8c5de2307ca7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Here is another attempt at implementing file cloning for pg_upgrade and
CREATE DATABASE. The idea is to take advantage of file systems that can
make copy-on-write clones, which would make the copy run much faster.
For pg_upgrade, this will give the performance of --link mode without
the associated drawbacks.

There have been patches proposed previously [0][1]. The concerns there
were mainly that they required a Linux-specific ioctl() call and only
worked for Btrfs.

Some new things have happened since then:

- XFS has (optional) reflink support. This file system is probably more
widely used than Btrfs.

- Linux and glibc have a proper function to do this now.

- APFS on macOS supports file cloning.

So altogether this feature will be more widely usable and less ugly to
implement. Note, however, that you will currently need literally the
latest glibc release, so it probably won't be accessible right now
unless you are using Fedora 28 for example. (This is the
copy_file_range() function that had us recently rename the same function
in pg_rewind.)

Some example measurements:

6 GB database, pg_upgrade unpatched 30 seconds, patched 3 seconds (XFS
and APFS)

similar for a CREATE DATABASE from a large template

Even if you don't have a file system with cloning support, the special
library calls make copying faster. For example, on APFS, in this
example, an unpatched CREATE DATABASE takes 30 seconds, with the library
call (but without cloning) it takes 10 seconds.

For amusement/bewilderment, without the recent flush optimization on
APFS, this takes 2 minutes 30 seconds. I suppose this optimization will
now actually obsolete, since macOS will no longer hit that code.

[0]:
https://www.postgresql.org/message-id/flat/513C0E7C.5080606%40socialserve.com

[1]:
https://www.postgresql.org/message-id/flat/20140213030731.GE4831%40momjian.us
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Use-file-cloning-in-pg_upgrade-and-CREATE-DATABASE.patch text/plain 8.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-02-21 03:19:45 Re: Duplicate Item Pointers in Gin index
Previous Message Peter Eisentraut 2018-02-21 02:22:29 support parameters in CALL