Re: zstd compression for pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
Subject: Re: zstd compression for pg_dump
Date: 2021-01-04 07:06:00
Message-ID: 20210104070600.GB9712@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 04, 2021 at 11:04:57AM +0500, Andrey Borodin wrote:
> > 4 янв. 2021 г., в 07:53, Justin Pryzby <pryzby(at)telsasoft(dot)com> написал(а):
> > Note, there's currently several "compression" patches in CF app. This patch
> > seems to be independent of the others, but probably shouldn't be totally
> > uncoordinated (like adding lz4 in one and ztsd in another might be poor
> > execution).
> >
> > https://commitfest.postgresql.org/31/2897/
> > - Faster pglz compression
> > https://commitfest.postgresql.org/31/2813/
> > - custom compression methods for toast
> > https://commitfest.postgresql.org/31/2773/
> > - libpq compression
>
> I think that's downside of our development system: patch authors do not want to create dependencies on other patches.

I think in these cases, someone who notices common/overlapping patches should
suggest that the authors review each other's work. In some cases, I think it's
appropriate to come up with a "shared" preliminary patch(es), which both (all)
patch authors can include as 0001 until its finalized and merged. That might
be true for some things like the tableam work, or the two "online checksum"
patches.

> I'd say that both lz4 and zstd should be supported in TOAST, FPIs, libpq, and pg_dump. As to pglz - I think we should not proliferate it any further.

pg_basebackup came up as another use on another thread, I think related to
libpq protocol compression.

> Libpq compression encountered some problems with memory consumption which
> required some extra config efforts. Did you measure memory usage for this
> patchset?

RAM use is not significantly different from zlib, except that zstd --long adds
more memory.

$ command time -v pg_dump -d ts -t ... -Fc -Z0 |wc -c
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.77
Maximum resident set size (kbytes): 40504
1397288924 # no compression: 1400MB

$ command time -v pg_dump -d ts -t ... -Fc |wc -c
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.17
Maximum resident set size (kbytes): 40504
132932415 # default (zlib) compression: 132 MB

$ command time -v ./pg_dump -d ts -t ... -Fc |wc -c
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.28
Maximum resident set size (kbytes): 40568
86048139 # zstd: 86MB

$ command time -v ./pg_dump -d ts -t ... -Fc -Z 'alg=zstd opt=zstdlong' |wc -c
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:30.49
Maximum resident set size (kbytes): 180332
72202937 # zstd long: 180MB

> [PATCH 04/20] struct compressLibs
> I think this directive would be correct.
> +// #ifdef HAVE_LIBZ?

I'm not sure .. I'm thinking of making the COMPR_ALG_* always defined, and then
fail later if an operation is unsupported. There's an excessive number of
#ifdefs already, so the early commits are intended to minimize as far as
possible what's needed for each additional compression
algorithm(lib/method/whatever it's called). I haven't tested much with
pg_restore of files with unsupported compression libs.

> [PATCH 06/20] pg_dump: zstd compression
> I'd propose to build with Zstd by default. It seems other patches do it this way. Though, I there are possible downsides.

Yes...but the cfbot turns red if the patch require zstd, so it defaults to
off until it's included in the build environments (but for now, the main patch
isn't being tested).

Thanks for looking.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Luc Vlaming 2021-01-04 07:59:01 Re: New Table Access Methods for Multi and Single Inserts
Previous Message Dilip Kumar 2021-01-04 06:52:36 Re: [HACKERS] Custom compression methods