| From: | Josh Kupershmidt <schmiddy(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | pg_dump: eliminate tmpfile double-write in tar format output |
| Date: | 2026-04-17 00:47:00 |
| Message-ID: | CAK3UJRE_9-iQsQpYnaZFx6RPL9AUqA2wehAc7fNgiY2yhJPZig@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Please find attached a patch optimizing pg_dump's tar format (-Ft) when
writing to a seekable file. The diff here is limited to
src/bin/pg_dump/pg_backup_tar.c.
Currently, every TOC entry in the tar-format dump goes through a temporary
file: data is written to a tmpfile, then on close the tmpfile is seeked to
determine its length, the tar header is written, and the entire tmpfile
gets copied to the tar output. We end up writing the data twice: once to
the tmpfile and once to the final tar file.
The patch adds a "direct-write" mode for seekable outputs. Instead of using
a tmpfile, we write a placeholder tar header (with length 0) directly to
the tar output, stream the data after it, then seek back to rewrite the
header with the actual length. This should cut the I/O in half for the data
path.
The tmpfile path is preserved as a fallback for three cases:
1. Output is not seekable (stdout/pipe)
2. Another member is already being written directly (guard against
interleaving)
3. We're in the LO section, where the blob TOC file stays open while
individual blob data files are written and closed inside it
On a test 500K-row database (~255MB, 184MB dump file), pg_dump -Ft time
goes down from about 1.42s (master) to 1.22s (patched). The percent
improvement is a bit less for larger databases: dump time goes down from
10.24s (master) to 9.34s (patched) for a database about 10x as large.
A benchmark script (bench_tar_direct_write.sh) is included for reproducing
some of the performance testing I did.
Thanks,
Josh
| Attachment | Content-Type | Size |
|---|---|---|
| bench_tar_direct_write.sh | application/x-sh | 2.7 KB |
| pg_backup_tar_mode_direct_write.diff | application/octet-stream | 8.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-04-17 01:28:24 | Re: Questionable description about character sets |
| Previous Message | Peter Smith | 2026-04-16 23:55:44 | DOCS - CREATE PUBLICATION ... EXCEPT missing details on ONLY |