Re: zstd compression for pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: zstd compression for pg_dump
Date: 2021-01-04 02:53:21
Message-ID: 20210104025321.GA9712@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 21, 2020 at 01:49:24PM -0600, Justin Pryzby wrote:
> a big disadvantage of piping through zstd is that it's not identified as a
> PGDMP file, and, /usr/bin/file on centos7 fails to even identify zstd by its
> magic number..

Other reasons are that pg_dump |zstd >output.zst loses the exit status of
pg_dump, and that it's not "transparent" (one needs to type
"zstd -dq |pg_restore").

On Mon, Dec 21, 2020 at 08:32:35PM -0600, Justin Pryzby wrote:
> On Mon, Dec 21, 2020 at 03:02:40PM -0500, Tom Lane wrote:
> > Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> > > I found that our largest tables are 40% smaller and 20% faster to pipe
> > > pg_dump -Fc -Z0 |zstd relative to native zlib
> >
> > The patch might be a tad smaller if you hadn't included a core file in it.
>
> About 89% smaller.
>
> This also fixes the extension (.zst)
> And fixes zlib default compression.
> And a bunch of cleanup.

I rebased so the "typedef struct compression" patch is first and zstd on top of
that (say, in case someone wants to bikeshed about which compression algorithm
to support). And made a central struct with all the compression-specific info
to further isolate the compress-specific changes.

And handle compression of "plain" archive format.
And fix compilation for MSVC and make --without-zstd the default.

And fix cfgets() (which I think is actually unused code for the code paths for
compressed FP).

And add fix for pre-existing problem: ftello() on unseekable input.

I also started a patch to allow compression of "tar" format, but I didn't
include that here yet.

Note, there's currently several "compression" patches in CF app. This patch
seems to be independent of the others, but probably shouldn't be totally
uncoordinated (like adding lz4 in one and ztsd in another might be poor
execution).

https://commitfest.postgresql.org/31/2897/
- Faster pglz compression
https://commitfest.postgresql.org/31/2813/
- custom compression methods for toast
https://commitfest.postgresql.org/31/2773/
- libpq compression

--
Justin

Attachment Content-Type Size
0001-fix-preeexisting.patch text/x-diff 979 bytes
0002-Fix-broken-error-message-on-unseekable-input.patch text/x-diff 1.4 KB
0003-Support-multiple-compression-algs-levels-opts.patch text/x-diff 34.1 KB
0004-struct-compressLibs.patch text/x-diff 5.9 KB
0005-Use-cf-abstraction-in-archiver-and-tar.patch text/x-diff 14.8 KB
0006-pg_dump-zstd-compression.patch text/x-diff 30.0 KB
0007-fix-comments.patch text/x-diff 3.5 KB
0008-union-with-a-CompressionAlgorithm-alg.patch text/x-diff 13.9 KB
0009-Move-zlib-into-the-union.patch text/x-diff 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-01-04 03:06:10 Re: A failure of standby to follow timeline switch
Previous Message Justin Pryzby 2021-01-04 01:22:20 Re: [HACKERS] Custom compression methods