Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitrios Apostolou <jimis(at)gmx(dot)net>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date: 2025-10-13 03:33:06
Message-ID: CBF70398-279C-4BA7-9FA1-2879A49F4EC9@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 12, 2025, at 09:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> While playing around with the test cases for pg_dump compression,
> I was startled to discover that the performance of compress_lz4's
> "stream API" code is absolutely abysmal. Here is a simple test
> case to demonstrate, using the regression database as test data:
>
> $ pg_dump -Fd --compress=lz4 -f rlz4.dir regression
> $ time pg_restore -f /dev/null rlz4.dir
>
> real 0m0.023s
> user 0m0.017s
> sys 0m0.006s
>
> So far so good, but now let's compress the toc.dat file:
>
> $ lz4 -f -m --rm rlz4.dir/toc.dat
> $ time pg_restore -f /dev/null rlz4.dir
>
> real 0m1.335s
> user 0m1.326s
> sys 0m0.008s
>
> Considering that lz4 prides itself on fast decompression speed,
> that is not a sane result. Decompressing the file only requires
> a couple ms on my machine:
>
> $ time lz4cat rlz4.dir/toc.dat.lz4 >/dev/null
>
> real 0m0.002s
> user 0m0.000s
> sys 0m0.002s
>
> So on this example, pg_restore is something more than 600x slower
> to read the TOC data than it ought to be.
>

I tested the patch on my MacBook M4 (Sequoia 15.6.1), compiled with -O2 optimization:

- For LZ4 performance improvement:

Without the fix (latest master):
```
chaol(at)ChaodeMacBook-Air tmp % pg_dump -Fd --compress=lz4 -f rlz4.dir regression

chaol(at)ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 0.03s user 0.03s system 11% cpu 0.463 total

chaol(at)ChaodeMacBook-Air tmp % lz4 -f -m --rm rlz4.dir/toc.dat

# It took 1.59s to restore from compressed data, much slower than from uncompressed data.
chaol(at)ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 1.59s user 0.02s system 97% cpu 1.653 total
```

With the fix:
```
chaol(at)ChaodeMacBook-Air tmp % pg_dump -Fd --compress=lz4 -f rlz4.dir regression

chaol(at)ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 0.02s user 0.03s system 16% cpu 0.305 total

chaol(at)ChaodeMacBook-Air tmp % lz4 -f -m --rm rlz4.dir/toc.dat

# Much faster !!!
chaol(at)ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir 0.02s user 0.02s system 93% cpu 0.043 total
```

- For Gzip zero read bug:

Without the fix:
```
chaol(at)ChaodeMacBook-Air tmp % pg_dump -Fd --compress=gzip -f rgzip.dir regression
chaol(at)ChaodeMacBook-Air tmp % gzip rgzip.dir/toc.dat

# Failed
chaol(at)ChaodeMacBook-Air tmp % pg_restore -f /dev/null rgzip.dir
pg_restore: error: could not read from input file:
```

With the fix:
```
chaol(at)ChaodeMacBook-Air tmp % pg_dump -Fd --compress=gzip -f rgzip.dir regression
chaol(at)ChaodeMacBook-Air tmp % gzip rgzip.dir/toc.dat

# Ran successfully
chaol(at)ChaodeMacBook-Air tmp % pg_restore -f /dev/null rgzip.dir
```

I also reviewed the code change, basically LGTM. Just a couple of trivial comments:

1 - 0001
In WriteDataToArchiveLZ4()

```
+ required = LZ4F_compressBound(chunk, &state->prefs);
+ if (required > state->buflen - state->bufdata)
+ {
+ cs->writeF(AH, state->buffer, state->bufdata);
+ state->bufdata = 0;
+ }
```

And in EndCompressorLZ4()
```
+ required = LZ4F_compressBound(0, &state->prefs);
+ if (required > state->buflen - state->bufdata)
+ {
+ cs->writeF(AH, state->buffer, state->bufdata);
+ state->bufdata = 0;
+ }
```

These two code pieces are similar, only difference is the first parameter passed to LZ4F_compressBound().

I think we can create an inline function for it. But these are just 5 lines, I don’t have a strong option on that.

Same thing for the other code pieces in LZ4Stream_write() and LZ4Stream_close().

2 - 0003
```
/*
* LZ4 equivalent to feof() or gzeof(). Return true iff there is no
- * decompressed output in the overflow buffer and the end of the backing file
```

This doesn’t belong to the current patch. But “iff” seems a typo of “if”. You may fix it as you are touching this piece of code.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-10-13 03:39:31 Re: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward
Previous Message Xuneng Zhou 2025-10-13 03:20:19 Re: pgstattuple: Use streaming read API in pgstatindex functions