Re: pg_combinebackup --copy-file-range

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_combinebackup --copy-file-range
Date: 2024-04-07 17:46:59
Message-ID: 8055f67f-3f05-4559-ad7f-68a4bc8705aa@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/5/24 21:43, Tomas Vondra wrote:
> Hi,
>
> ...
>
> 2) The prefetching is not a huge improvement, at least not for these
> three filesystems (btrfs, ext4, xfs). From the color scale it might seem
> like it helps, but those values are relative to the baseline, so when
> the non-prefetching value is 5% and with prefetching 10%, that means the
> prefetching makes it slower. And that's very often true.
>
> This is visible more clearly in prefetching.pdf, comparing the
> non-prefetching and prefetching results for each patch, not to baseline.
> That's makes it quite clear there's a lot of "red" where prefetching
> makes it slower. It certainly does help for larger increments (which
> makes sense, because the modified blocks are distributed randomly, and
> thus come from random files, making long streaks unlikely).
>
> I've imagined the prefetching could be made a bit smarter to ignore the
> streaks (=sequential patterns), but once again - this only matters with
> the batching, which we don't have. And without the batching it looks
> like a net loss (that's the first column in the prefetching PDF).
>
> I did start thinking about prefetching because of ZFS, where it was
> necessary to get decent performance. And that's still true. But (a) even
> with the explicit prefetching it's still 2-3x slower than any of these
> filesystems, so I assume performance-sensitive use cases won't use it.
> And (b) the prefetching seems necessary in all cases, no matter how
> large the increment is. Which goes directly against the idea of looking
> at how random the blocks are and prefetching only the sufficiently
> random patterns. That doesn't seem like a great thing.
>

I finally got a more complete ZFS results, and I also decided to get
some numbers without the ZFS tuning I did. And boy oh boy ...

All the tests I did with ZFS were tuned the way I've seen recommended
when using ZFS for PostgreSQL, that is

zfs set recordsize=8K logbias=throughput compression=none

and this performed quite poorly - pg_combinebackup took 4-8x longer than
with the traditional filesystems (btrfs, xfs, ext4) and the only thing
that improved that measurably was prefetching.

But once I reverted back to the default recordsize of 128kB the
performance is waaaaaay better - entirely comparable to ext4/xfs, while
btrfs remains faster with --copy-file-range --no-manigest (by a factor
of 2-3x).

This is quite clearly visible in the attached "current.pdf" which shows
results for the current master (i.e. filtered to the 3-reconstruct patch
adding CoW stuff to write_reconstructed_file).

There's also some differences in the disk usage, where ZFS seems to need
more space than xfs/btrfs (as if there was no block sharing), but maybe
that's due to how I measure this using df ...

I also tried also "completely default" ZFS configuration, with all
options left at the default (recordsize=128kB, compression=lz4, and
logbias=latency). That performs about the same, except that the disk
usage is lower thanks to the compression.

note: Because I'm hip cool kid, I also ran the tests on bcachefs. The
results are included in the CSV/PDF attachments. In general it's much
slower than xfs/btrfs/ext, and the disk space is somewhere in between
btrfs and xfs (for the CoW cases). We'll see how this improves as it
matures in the future.

The attachments are tables with the total duration / disk space usage,
and impact of prefetching. The tables are similar to what I shared
before, except that the color scale is applied to the values directly.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
current.pdf application/pdf 45.9 KB
prefetch.pdf application/pdf 83.6 KB
results.csv text/csv 205.4 KB
size.pdf application/pdf 64.1 KB
duration.pdf application/pdf 77.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-04-07 17:53:08 Re: pg_combinebackup --copy-file-range
Previous Message Nazir Bilal Yavuz 2024-04-07 17:33:34 Re: Streaming I/O, vectored I/O (WIP)