CREATE DATABASE with filesystem cloning

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: CREATE DATABASE with filesystem cloning
Date: 2023-10-07 05:51:45
Message-ID: CA+hUKGLM+t+SwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

Here is an experimental POC of fast/cheap database cloning. For
clones from little template databases, no one cares much, but it might
be useful to be able to create a snapshot or fork of very large
database for testing/experimentation like this:

create database foodb_snapshot20231007 template=foodb strategy=file_clone

It should be a lot faster, and use less physical disk, than the two
existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
APFS (= macOS), and it could in theory be extended to other systems
that invented different system calls for this with more work (Solaris,
Windows). Then extra physical disk space will be consumed only as the
two clones diverge.

It's just like the old strategy=file_copy, except it asks the OS to do
its best copying trick. If you try it on a system that doesn't
support copy-on-write, then copy_file_range() should fall back to
plain old copy, but it might still be better than we could do, as it
can push copy commands to network storage or physical storage.

Therefore, the usual caveats from strategy=file_copy also apply here.
Namely that it has to perform checkpoints which could be very
expensive, and there are some quirks/brokenness about concurrent
backups and PITR. Which makes me wonder if it's worth pursuing this
idea. Thoughts?

I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl
vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a
very new feature that is still being rolled out. The system call
succeeds either way, but that controls whether the new database
initially shares blocks on disk, or get new copies. I also tested on
a Mac. In both cases I could clone large databases in a fraction of a
second.

Attachment Content-Type Size
0001-WIP-CREATE-DATABASE-.-STRATEGY-FILE_CLONE.patch text/x-patch 9.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-10-07 09:39:48 Check each of base restriction clauses for constant-FALSE-or-NULL
Previous Message Erik Wienhold 2023-10-07 03:07:50 Re: Fix output of zero privileges in psql