Quick Links

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date:	2025-10-12 01:25:20
Message-ID:	3686.1760232320@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

While playing around with the test cases for pg_dump compression,
I was startled to discover that the performance of compress_lz4's
"stream API" code is absolutely abysmal. Here is a simple test
case to demonstrate, using the regression database as test data:

$ pg_dump -Fd --compress=lz4 -f rlz4.dir regression
$ time pg_restore -f /dev/null rlz4.dir

real 0m0.023s
user 0m0.017s
sys 0m0.006s

So far so good, but now let's compress the toc.dat file:

$ lz4 -f -m --rm rlz4.dir/toc.dat
$ time pg_restore -f /dev/null rlz4.dir

real 0m1.335s
user 0m1.326s
sys 0m0.008s

Considering that lz4 prides itself on fast decompression speed,
that is not a sane result. Decompressing the file only requires
a couple ms on my machine:

$ time lz4cat rlz4.dir/toc.dat.lz4 >/dev/null

real 0m0.002s
user 0m0.000s
sys 0m0.002s

So on this example, pg_restore is something more than 600x slower
to read the TOC data than it ought to be.

On investigation, the blame mostly affixes to LZ4Stream_read_overflow's
habit of memmove'ing all the remaining buffered data after each read
operation. Since reading a TOC file tends to involve a lot of small
(even one-byte) decompression calls, that amounts to an O(N^2) cost.

This could have been fixed with a minimal patch, but to my
eyes LZ4Stream_read_internal and LZ4Stream_read_overflow are
badly-written spaghetti code; in particular the eol_flag logic
is inefficient and duplicative. I chose to throw the code
away and rewrite from scratch. This version is about sixty
lines shorter as well as not having the performance issue.

Fortunately, AFAICT the only way to get to this problem is to
manually LZ4-compress the toc.dat and/or blobs.toc files within a
directory-style archive. Few people do that, which likely explains
the lack of field complaints.

On top of that, a similar case with gzip doesn't work at all,
though it's supposed to:

$ pg_dump -Fd --compress=gzip -f rgzip.dir regression
$ gzip rgzip.dir/toc.dat
$ pg_restore -f /dev/null rgzip.dir
pg_restore: error: could not read from input file:

Tracking this down, it seems that Gzip_read doesn't cope with
a request to read zero bytes. I wonder how long that's been
broken.

As far as I can see, 002_pg_dump.pl doesn't exercise the case of
manually-compressed toc.dat files. I wonder why not.

0001 and 0002 attached are the same as before, then 0003 adds a fix
for the LZ4 performance problem, and 0004 fixes the Gzip_read problem.
While at it, I got rid of a few other minor inefficiencies such as
unnecessary buffer-zeroing.

regards, tom lane

Attachment	Content-Type	Size
v3-0001-Fix-poor-buffering-logic-in-pg_dump-s-lz4-and-zst.patch	text/x-diff	13.5 KB
v3-0002-Try-to-align-the-block-sizes-of-pg_dump-s-various.patch	text/x-diff	3.4 KB
v3-0003-Fix-serious-performance-problems-in-LZ4Stream_rea.patch	text/x-diff	12.3 KB
v3-0004-Fix-issues-with-reading-zero-bytes-in-Gzip_read-a.patch	text/x-diff	2.1 KB

In response to

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward at 2025-10-11 02:22:16 from Tom Lane

Responses

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward at 2025-10-13 03:33:06 from Chao Li

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2025-10-12 03:35:02	Re: IO in wrong state on riscv64
Previous Message	Thomas Munro	2025-10-12 00:42:30	Re: GNU/Hurd portability patches