Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitrios Apostolou <jimis(at)gmx(dot)net>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date: 2025-10-14 05:36:07
Message-ID: 09C1DFFE-6078-4C8E-B38D-A4D5CE087433@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 14, 2025, at 10:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> This won't show the effect, because pg_dump will be able to go back
> and insert data offsets into the dump's TOC, so pg_restore can just
> seek to where the data is. See upthread discussion about what's
> needed to provoke Dimitrios' problem.
>
> I tried this very tiny (relatively speaking) test case:
>
> regression=# create database d1;
> CREATE DATABASE
> regression=# \c d1
> You are now connected to database "d1" as user "postgres".
> d1=# create table alpha as select repeat(random()::text, 1000) from generate_series(1,1000000);
> SELECT 1000000
> d1=# create table omega as select 42 as x;
> SELECT 1
> d1=# \q
>
> Then
>
> $ pg_dump -Fc d1 | cat >d1.dump
> $ time pg_restore -f /dev/null -t omega d1.dump
>
> The point of the pipe-to-cat is to reproduce Dimitrios' problem case
> with no data offsets in the TOC. Then the restore is doing about the
> simplest thing I can think of to make it skip over most of the archive
> file. Also, I'm intentionally using the default choice of gzip
> because that already responds to DEFAULT_IO_BUFFER_SIZE properly.
> (This test is with current HEAD, no patches except adjusting
> DEFAULT_IO_BUFFER_SIZE.)
>
> I got these timings:
>
> DEFAULT_IO_BUFFER_SIZE = 1K
> real 0m0.020s
> user 0m0.002s
> sys 0m0.017s
>
> DEFAULT_IO_BUFFER_SIZE = 4K
> real 0m0.014s
> user 0m0.003s
> sys 0m0.011s
>
> DEFAULT_IO_BUFFER_SIZE = 128K
> real 0m0.002s
> user 0m0.000s
> sys 0m0.002s
>
> This test case has only about 50MB worth of compressed data,
> so of course the times are very small; scaling it up to
> gigabytes would yield more impressive results. But the
> effect is clearly visible.
>

With your example, I can now see the difference, however, I had to create 5 more times of rows in the first table:

```
evantest=# CREATE TABLE alpha AS SELECT repeat(random()::text, 1000) FROM generate_series(1, 5000000);
SELECT 5000000
evantest=#
evantest=# CREATE TABLE omega AS SELECT 42 AS x;
SELECT 1
```

My test is with the patch, I only adjusted DEFAULT_IO_BUFFER_SIZE.

DEFAULT_IO_BUFFER_SIZE=4K
```
% /usr/bin/time pg_dump -Fc evantest | cat > d1.dump
294.83 real 220.28 user 45.90 sys

% /usr/bin/time pg_restore -f /dev/null -t omega d1.dump
0.16 real 0.02 user 0.09 sys
```

DEFAULT_IO_BUFFER_SIZE=128K
```
% /usr/bin/time pg_dump -Fc evantest | cat > d1.dump
296.89 real 220.85 user 46.64 sys

% /usr/bin/time pg_restore -f /dev/null -t omega d1.dump
0.01 real 0.00 user 0.00 sys
```

With bigger blocker size, pg_restore skips less blocks, so it gets faster.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-10-14 05:40:59 Re: Add LZ4 compression in pg_dump
Previous Message shveta malik 2025-10-14 05:21:11 Re: Logical Replication of sequences