From 865a0ccacf8ec065426b1e2823aba9d3cc2c1caf Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 10 Oct 2025 22:08:13 -0400
Subject: [PATCH v2 2/2] Try to align the block sizes of pg_dump's various
 compression modes.

(This is more of a straw man for discussion than a finished patch.)

After the previous patch, compress_zstd.c tends to produce data block
sizes around 128K, and we don't really have any control over that
unless we want to overrule ZSTD_CStreamOutSize().  Which seems like
a bad idea.  But let's try to align the other compression modes to
produce block sizes roughly comparable to that, so that pg_restore's
skip-data performance isn't enormously different for different modes.

gzip compression can be brought in line simply by setting
DEFAULT_IO_BUFFER_SIZE = 128K, which this patch does.  That
increases some unrelated buffer sizes, but none of them seem
problematic for modern platforms.

lz4's idea of appropriate block size is highly nonlinear:
if we just increase DEFAULT_IO_BUFFER_SIZE then the output
blocks end up around 200K.  I found that adjusting the slop
factor in LZ4State_compression_init was a not-too-ugly way
of bringing that number into line.

With compress = none you get data blocks the same sizes
as the table rows.  We could avoid that by introducing
an additional layer of buffering, but it's not clear to
me that that's a net win, so this patch doesn't do so.

Comments in compress_io.h and 002_pg_dump.pl suggest that if
we increase DEFAULT_IO_BUFFER_SIZE then we need to increase the
amount of data fed through the tests in order to improve coverage.
I've not done that here either.  In my view, the decompression side
of compress_lz4.c needs to be rewritten to be simpler, rather than
tested more.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
---
 src/bin/pg_dump/compress_io.h  | 2 +-
 src/bin/pg_dump/compress_lz4.c | 9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index 25a7bf0904d..53cf8c9b03b 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -24,7 +24,7 @@
  * still exercise all the branches. This applies especially if the value is
  * increased, in which case the overflow buffer may not be needed.
  */
-#define DEFAULT_IO_BUFFER_SIZE	4096
+#define DEFAULT_IO_BUFFER_SIZE	(128 * 1024)
 
 extern char *supports_compression(const pg_compress_specification compression_spec);
 
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 47ee2e4bbac..c9ea895c137 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -102,9 +102,14 @@ LZ4State_compression_init(LZ4State *state)
 	state->buflen = LZ4F_compressBound(DEFAULT_IO_BUFFER_SIZE, &state->prefs);
 
 	/*
-	 * Then double it, to ensure we're not forced to flush every time.
+	 * Add some slop to ensure we're not forced to flush every time.
+	 *
+	 * The present slop factor of 50% is chosen so that the typical output
+	 * block size is about 128K when DEFAULT_IO_BUFFER_SIZE = 128K.  We might
+	 * need a different slop factor to maintain that equivalence if
+	 * DEFAULT_IO_BUFFER_SIZE is changed dramatically.
 	 */
-	state->buflen *= 2;
+	state->buflen += state->buflen / 2;
 
 	/*
 	 * LZ4F_compressBegin requires a buffer that is greater or equal to
-- 
2.43.7

