Quick Links

Re: Proposal: Adding compression of temporary files

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Filip Janus <fjanus(at)redhat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: Adding compression of temporary files
Date:	2025-10-01 15:53:26
Message-ID:	8c9cd489-9d46-48bb-9a8d-64f4536a2abc@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 9/30/25 14:42, Tomas Vondra wrote:
>
> v20250930-0018-undo-unncessary-changes-to-Makefile.patch
>
> - Why did the 0001 patch add this? Maybe it's something we should add
> separately, not as part of this patch?
>

I realized this bit is actually necessary, to make the EXTRA_TESTS work
for the lz4 regression test. The attached patch series skips this bit.

There's also experimental patches adding gzip (or rather libz) and zstd
compression. This is very rough, I just wanted to see how would these
perform compared to pglz/lz4. But I haven't done any proper evaluation
so far, beyond running a couple simple queries. Will try to spend a bit
more time on that soon.

I still wonder about the impact of stream compression. I know it can
improve the compression ratio, but I'm not sure if it also helps with
the compression speed. I think for temporary files faster compression
(and lower ratio) may be a better trade off. So maybe we should user
lower compression levels ...

Attached are two PDF files with results of the perf evaluation using
TPC-H 10GB and 50GB data sets. One table shows timings for 22 queries
with compression set to no/pglz/lz4, for a range of parameter
combinations (work_mem, parallel workers). The other shows amount of
temporary files (in MBs) generated by each query.

The timing shows that pglz is pretty slow, about doubling duration for
some of the queries. That's not surprising, we know pglz can be slow.
lz4 is almost perfectly neutral, which is actually great - the goal is
to reduce I/O pressure for temporary files, but with a single query
running at a time, that's not a problem. So "no impact" is about the
best we can do, it shows the lz4 overhead is negligible.

For "size" PDF shows that the compression can save a fair amount of temp
space. For many queries it saves 50-70% of temporary space. A good
example is Q9 which (on the 50GB scale) used to take about 33GB, and
with compression it's down to ~17GB (with both pglz and lz4). That's
pretty good, I think.

FWIW the "size" results may be a bit misleading, in that it measures
tempfile size for the whole query. But some may use multiple temporary
files, and some may not support compression (e.g. tuplesort don't).
Which will make the actual compression ratio look lower. OTOH it's a
more representative of impact on actual queries.

regards

--
Tomas Vondra

Attachment	Content-Type	Size
v20251001-0001-Add-transparent-compression-for-temporary-.patch	text/x-patch	19.4 KB
v20251001-0002-whitespace.patch	text/x-patch	3.0 KB
v20251001-0003-pgindent.patch	text/x-patch	10.1 KB
v20251001-0004-Add-regression-tests-for-temporary-file-co.patch	text/x-patch	126.8 KB
v20251001-0005-remove-unused-BufFile-compress_tempfile.patch	text/x-patch	952 bytes
v20251001-0006-simplify-BufFileCreateTemp-interface.patch	text/x-patch	5.0 KB
v20251001-0007-improve-BufFileCreateTemp-BufFileCreateCom.patch	text/x-patch	2.3 KB
v20251001-0008-BufFileCreateCompressTemp-cleanup-and-comm.patch	text/x-patch	2.4 KB
v20251001-0009-minor-BufFileLoadBuffer-cleanup.patch	text/x-patch	2.0 KB
v20251001-0010-BufFileLoadBuffer-simpler-FileRead-handlin.patch	text/x-patch	1.2 KB
v20251001-0011-BufFileLoadBuffer-simpler-FileRead-handlin.patch	text/x-patch	1.3 KB
v20251001-0012-BufFileLoadBuffer-comment-update.patch	text/x-patch	837 bytes
v20251001-0013-BufFileLoadBuffer-simplify-skipping-header.patch	text/x-patch	2.6 KB
v20251001-0014-BufFileDumpBuffer-cleanup-simplification.patch	text/x-patch	4.6 KB
v20251001-0015-BufFileLoadBuffer-comment.patch	text/x-patch	926 bytes
v20251001-0016-BufFileLoadBuffer-missing-FileRead-error-h.patch	text/x-patch	839 bytes
v20251001-0017-simplify-the-compression-header.patch	text/x-patch	10.4 KB
v20251001-0018-enable-compression-for-tuplestore.patch	text/x-patch	1.1 KB
v20251001-0019-remember-compression-method-for-each-file.patch	text/x-patch	2.8 KB
v20251001-0020-LZ4_compress_default-returns-0-on-error.patch	text/x-patch	843 bytes
v20251001-0021-try-LZ4_compress_fast.patch	text/x-patch	1.0 KB
v20251001-0022-experimental-zlib-gzip-compression.patch	text/x-patch	3.6 KB
v20251001-0023-experimental-zstd-compression.patch	text/x-patch	3.7 KB
v20251001-0024-add-regression-test-for-gzip-zlib.patch	text/x-patch	63.0 KB
v20251001-0025-add-regression-test-for-zstd.patch	text/x-patch	63.4 KB
compress-tpch-size.pdf	application/pdf	109.5 KB
compress-tpch-timing.pdf	application/pdf	116.2 KB

In response to

Re: Proposal: Adding compression of temporary files at 2025-09-30 12:42:18 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2025-10-01 16:22:19	Re: Suggestion to add --continue-client-on-abort option to pgbench
Previous Message	Frits Hoogland	2025-10-01 15:25:00	Re: The ability of postgres to determine loss of files of the main fork