Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date: 2025-06-10 21:47:59
Message-ID: aEioDuePEBRnfJYk@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 09, 2025 at 10:09:57PM +0200, Dimitrios Apostolou wrote:
> Fix by avoiding forward seeks for jumps of less than 1MB forward.
> Do instead sequential reads.
>
> Performance gain can be significant, depending on the size of the dump
> and the I/O subsystem. On my local NVMe drive, read speeds for that
> phase of pg_restore increased from 150MB/s to 3GB/s.

I was curious about what exactly was leading to the performance gains you
are seeing. This page has an explanation:

https://www.mjr19.org.uk/IT/fseek.html

I also wrote a couple of test programs to show the difference between
fseeko-ing and fread-ing through a file with various sizes. On a Linux
machine, I see this:

log2(n) | fseeko | fread
---------+---------+-------
1 | 109.288 | 5.528
2 | 54.881 | 2.848
3 | 27.65 | 1.504
4 | 13.953 | 0.834
5 | 7.1 | 0.49
6 | 3.665 | 0.322
7 | 1.944 | 0.244
8 | 1.085 | 0.201
9 | 0.658 | 0.185
10 | 0.443 | 0.175
11 | 0.253 | 0.171
12 | 0.102 | 0.162
13 | 0.075 | 0.13
14 | 0.061 | 0.114
15 | 0.054 | 0.1

So, fseeko() starts winning around 4096 bytes. On macOS, the differences
aren't quite as dramatic, but 4096 bytes is the break-even point there,
too. I imagine there's a buffer around that size somewhere...

This doesn't fully explain the results you are seeing, but it does seem to
validate the idea. I'm curious if you see further improvement with even
lower thresholds (e.g., 8KB, 16KB, 32KB).

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-06-10 22:22:46 Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Previous Message Daniel Verite 2025-06-10 21:44:54 Re: CREATE DATABASE command for non-libc providers