Re: zstd compression for pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: gkokolatos(at)pm(dot)me, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>
Subject: Re: zstd compression for pg_dump
Date: 2023-03-05 17:47:58
Message-ID: 20230305174758.GA11636@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 25, 2023 at 07:22:27PM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 01:18:40PM -0600, Justin Pryzby wrote:
> > This is a draft patch - review is welcome and would help to get this
> > ready to be considererd for v16, if desired.
> >
> > I'm going to add this thread to the old CF entry.
> > https://commitfest.postgresql.org/31/2888/
>
> This resolves cfbot warnings: windows and cppcheck.
> And refactors zstd routines.
> And updates docs.
> And includes some fixes for earlier patches that these patches conflicts
> with/depends on.

This rebases over the TAP and doc fixes to LZ4.
And adds necessary ENV to makefile and meson.
And adds an annoying boilerplate header.
And removes supports_compression(), which is what I think Tomas meant
when referring to "annoying unsupported cases".
And updates zstd.c: fix an off-by-one, allocate in init depending on
readF/writeF, do not reset the input buffer on each iteration, and show
parameter name in errors.

I'd appreciate help checking that this is doing the right things and
works correctly with zstd threaded workers. The zstd API says: "use one
different context per thread for parallel execution" and "For parallel
execution, use one separate ZSTD_CStream per thread".
https://github.com/facebook/zstd/blob/dev/lib/zstd.h

I understand that to mean that, if pg_dump *itself* were using threads,
then each thread would need to call ZSTD_createCStream(). pg_dump isn't
threaded, so there's nothing special needed, right?

Except that, under windows, pg_dump -Fd -j actually uses threads instead
of forking. I *think* that's still safe, since the pgdump threads are
created *before* calling zstd functions (see _PrintTocData and
_StartData of the custom and directory formats), so it happens naturally
that there's a separate zstd stream for each thread of pgdump.

--
Justin

Attachment Content-Type Size
0001-pg_dump-zstd-compression.patch text/x-diff 28.8 KB
0002-TMP-pg_dump-use-Zstd-by-default-for-CI-only.patch text/x-diff 2.2 KB
0003-zstd-support-long-distance-mode-in-pg_dump-basebacku.patch text/x-diff 13.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-05 17:54:31 Re: Date-Time dangling unit fix
Previous Message Tom Lane 2023-03-05 16:51:05 Re: [Question] Similar Cost but variable execution time in sort