Re: pg_upgrade --copy-file-range

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade --copy-file-range
Date: 2024-01-05 12:40:45
Message-ID: CAKZiRmyQ_F+OxHUi0+po9wnM=iwB0XUd=-ZT0ry_mOQJRnwmfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas, Michael, Peter and -hackers,

On Sun, Dec 24, 2023 at 3:57 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sat, Dec 23, 2023 at 09:52:59AM +1300, Thomas Munro wrote:
> > As it happens I was just thinking about this particular patch because
> > I suddenly had a strong urge to teach pg_combinebackup to use
> > copy_file_range. I wonder if you had the same idea...
>
> Yeah, +1. That would make copy_file_blocks() more efficient where the
> code is copying 50 blocks in batches because it needs to reassign
> checksums to the blocks copied.

I've tried to achieve what you were discussing. Actually this was my
first thought when using pg_combinebackup with larger (realistic)
backup sizes back in December. Attached is a set of very DIRTY (!)
patches that provide CoW options (--clone/--copy-range-file) to
pg_combinebackup (just like pg_upgrade to keep it in sync), while also
refactoring some related bits of code to avoid duplication.

With XFS (with reflink=1 which is default) on Linux with kernel 5.10
and ~210GB backups, I'm getting:

root(at)jw-test-1:/xfs# du -sm *
210229 full
250 incr.1

Today in master, the old classic read()/while() loop without
CoW/reflink optimization :
root(at)jw-test-1:/xfs# rm -rf outtest; sync; sync ; sync; echo 3 | sudo
tee /proc/sys/vm/drop_caches ; time /usr/pgsql17/bin/pg_combinebackup
--manifest-checksums=NONE -o outtest full incr.1
3

real 49m43.963s
user 0m0.887s
sys 2m52.697s

VS patch with "--clone" :

root(at)jw-test-1:/xfs# rm -rf outtest; sync; sync ; sync; echo 3 | sudo
tee /proc/sys/vm/drop_caches ; time /usr/pgsql17/bin/pg_combinebackup
--manifest-checksums=NONE --clone -o outtest full incr.1
3

real 0m39.812s
user 0m0.325s
sys 0m2.401s

So it is 49mins down to 40 seconds(!) +/-10s (3 tries) if the FS
supports CoW/reflinks (XFS, BTRFS, upcoming bcachefs?). It looks to me
that this might mean that if one actually wants to use incremental
backups (to get minimal RTO), it would be wise to only use CoW
filesystems from the start so that RTO is as low as possible.

Random patch notes:
- main meat is in v3-0002*, I hope i did not screw something seriously
- in worst case: it is opt-in through switch, so the user always can
stick to the classic copy
- no docs so far
- pg_copyfile_offload_supported() should actually be fixed if it is a
good path forward
- pgindent actually indents larger areas of code that I would like to,
any ideas or is it ok?
- not tested on Win32/MacOS/FreeBSD
- i've tested pg_upgrade manually and it seems to work and issue
correct syscalls, however some tests are failing(?). I haven't
investigated why yet due to lack of time.

Any help is appreciated.

-J.

Attachment Content-Type Size
v3-0001-Add-copy_file_range-3-system-call-detection.-Futu.patch application/octet-stream 2.7 KB
v3-0002-Confine-various-OS-copy-on-write-and-other-copy-a.patch application/octet-stream 37.8 KB
v3-0003-Add-copy-file-range-to-pg_upgrade-using-pg_copyfi.patch application/octet-stream 65.0 KB
v3-0004-Add-clone-and-copy-file-range-copy-strategies-to-.patch application/octet-stream 36.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-01-05 13:51:53 Re: the s_lock_stuck on perform_spin_delay
Previous Message Amit Kapila 2024-01-05 12:15:44 Re: Synchronizing slots from primary to standby