using file cloning in create database / initdb

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: using file cloning in create database / initdb
Date: 2022-02-13 03:37:30
Message-ID: 20220213033730.bdlsyyqq52guaduo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

This thread started at https://www.postgresql.org/message-id/20220213021746.GM31460%40telsasoft.com
but is mostly independent, so I split the thread off

On 2022-02-12 20:17:46 -0600, Justin Pryzby wrote:
> On Sat, Feb 12, 2022 at 06:00:44PM -0800, Andres Freund wrote:
> > I bet using COW file copies would speed up our own regression tests noticeably
> > - on slower systems we spend a fair bit of time and space creating template0
> > and postgres, with the bulk of the data never changing.
> >
> > Template databases are also fairly commonly used by application developers to
> > avoid the cost of rerunning all the setup DDL & initial data loading for
> > different tests. Making that measurably cheaper would be a significant win.
>
> +1
>
> I ran into this last week and was still thinking about proposing it.
>
> Would this help CI

It could theoretically help linux - but currently I think the filesystem for
CI is ext4, which doesn't support FICLONE. I assume it'd help macos, but I
don't know the performance characteristics of copyfile(). I don't think any of
the other OSs have working reflink / file clone support.

You could prototype it for CI on macos by using the "template initdb" patch
and passing -c to cp.

On linux it might be worth using copy_file_range(), if supported, if not file
cloning. But that's kind of an even more separate topic...

> or any significant fraction of buildfarm ?

Not sure how many are on new enough linux / mac to benefit and use a suitable
filesystem. There are a few animals with slow-ish storage but running fairly
new linux. Don't think we can see the FS. Those would likely benefit the most.

> Or just tests run locally on supporting filesystems.

Probably depends on your storage subsystem. If not that fast, and running
tests concurrently, it'd likely help.

On my workstation, with lots of cores and very fast storage, using the initdb
caching patch modified to do cp --reflink=never / always yields the following
time for concurrent check-world (-j40 PROVE_FLAGS=-j4):

cp --reflink=never:

96.64user 61.74system 1:04.69elapsed 244%CPU (0avgtext+0avgdata 97544maxresident)k
0inputs+34124296outputs (2584major+7247038minor)pagefaults 0swaps
pcheck-world-success

cp --reflink=always:

91.79user 56.16system 1:04.21elapsed 230%CPU (0avgtext+0avgdata 97716maxresident)k
189328inputs+16361720outputs (2674major+7229696minor)pagefaults 0swaps
pcheck-world-success

Seems roughly stable across three runs.

Just comparing the time for cp -r of a fresh initdb'd cluster:
cp -a --reflink=never
real 0m0.043s
user 0m0.000s
sys 0m0.043s
cp -a --reflink=always
real 0m0.021s
user 0m0.004s
sys 0m0.018s

so that's a pretty nice win.

> Note that pg_upgrade already supports copy/link/clone. (Obviously, link
> wouldn't do anything desirable for CREATE DATABASE).

Yea. We'd likely have to move relevant code into src/port.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-13 03:39:54 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Masahiko Sawada 2022-02-13 03:36:13 Re: [PoC] Improve dead tuple storage for lazy vacuum