Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date: 2025-06-14 16:17:41
Message-ID: 72d2c81b-a0fe-3d99-9dd4-0d771e6e673e@gmx.net
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 14 Jun 2025, Dimitrios Apostolou wrote:

> On Fri, 13 Jun 2025, Nathan Bossart wrote:
>
>> On Fri, Jun 13, 2025 at 01:00:26AM +0200, Dimitrios Apostolou wrote:
>>> By the way, I might have set the threshold to 1MB in my program, but
>>> lowering it won't show a difference in my test case, since the lseek()s I
>>> was noticing before the patch were mostly 8-16KB forward. Not sure what
>>> is
>>> the defining factor for that. Maybe the compression algorithm, or how
>>> wide
>>> the table is?
>>
>> I may have missed it, but could you share what the strace looks like with
>> the patch applied?
>
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 12288) = 12288
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 12288) = 12288
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 12288) = 12288
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 12288) = 12288
> read(4, "..."..., 4096) = 4096
> read(4, "..."..., 8192) = 8192
> read(4, "..."..., 4096) = 4096

This was from pg_restoring a zstd-compressed custom format dump.

Out of curiosity I've tried the same with an uncompressed dump
(--compress=none). Surprisingly it seems the blocksize is even smaller.

With my patched pg_restore I only get 4K reads and nothing else on
the strace output.

read(4, "..."..., 4096) = 4096
read(4, "..."..., 4096) = 4096
read(4, "..."..., 4096) = 4096
read(4, "..."..., 4096) = 4096
read(4, "..."..., 4096) = 4096
read(4, "..."..., 4096) = 4096

The unpatched pg_restore gives me the weirdest output ever:

read(4, "..."..., 4096) = 4096
lseek(4, 98527916032, SEEK_SET) = 98527916032
lseek(4, 98527916032, SEEK_SET) = 98527916032
lseek(4, 98527916032, SEEK_SET) = 98527916032
lseek(4, 98527916032, SEEK_SET) = 98527916032
lseek(4, 98527916032, SEEK_SET) = 98527916032
lseek(4, 98527916032, SEEK_SET) = 98527916032
[ ... repeats about 80 times ...]
read(4, "..."..., 4096) = 4096
lseek(4, 98527920128, SEEK_SET) = 98527920128
lseek(4, 98527920128, SEEK_SET) = 98527920128
lseek(4, 98527920128, SEEK_SET) = 98527920128
lseek(4, 98527920128, SEEK_SET) = 98527920128
[ ... repeats ... ]

Seeing this, I think we should really consider raising the pg_dump block
size like Tom suggested on a previous thread.

Dimitris

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2025-06-14 18:44:59 Re: Non-reproducible AIO failure
Previous Message Dimitrios Apostolou 2025-06-14 16:01:13 Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward