Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-12-19 17:03:21
Message-ID: IZzs7Z2ny5hRK1EhTynSg3DnCo_z-UvEJN4Q6Hf0CUHWx3BcNwt-2z5vvvvA-Z8PQL-AYiL9YQPlKylATZ_Z6HnJ55YZUHgL66H9MZQFi48=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Monday, December 19th, 2022 at 5:06 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

>
>
> On Sat, Dec 17, 2022 at 05:26:15PM -0600, Justin Pryzby wrote:
>

Thank you for the comments, please find v18 attached.

> > 001: still refers to "gzip", which is correct for -Fp and -Fd but not
> > for -Fc, for which it's more correct to say "zlib".
>
>
> Or should we begin by changing all these existing "not built with zlib
> support" error strings to the more generic "this build does not
> support compression with %s" to reduce the number of messages to
> translate? That would bring consistency with the other tools dealing
> with compression.

This has been the approach from 0002 on-wards. In the attached it is also
applied on the remaining location in 0001.

>
> > That affects the
> > name of the function, structures, comments, etc. I'm not sure if it's
> > an issue to re-use the basebackup compression routines here. Maybe we
> > should accept "-Zn" for zlib output (-Fc), but reject "gzip:9", which
> > I'm sure some will find confusing, as it does not output. Maybe 001
> > should be split into a patch to re-use the existing "cfp" interface
> > (which is a clear win), and 002 to re-use the basebackup interfaces for
> > user input and constants, etc.
> >
> > 001 still doesn't compile on freebsd, and 002 doesn't compile on
> > windows. Have you checked test results from cirrusci on your private
> > github account ?

There are still known gaps in 0002 and 0003, for example documentation,
and I have not been focusing too much on those. You are right, it is helpful
and kind to try to reduce the noise. The attached should have hopefully
tackled the ci errors.

>
> FYI, I have re-added an entry to the CF app to get some automated
> coverage:
> https://commitfest.postgresql.org/41/3571/

Much obliged. Should I change the state to "ready for review" when post a
new version or should I leave that to the senior personnel?

>
> On MinGW, a complain about the open() callback, which I guess ought to
> be avoided with a rename:
> [00:16:37.254] compress_gzip.c:356:38: error: macro "open" passed 4 arguments, but takes just 3
> [00:16:37.254] 356 | ret = CFH->open(fname, -1, mode, CFH);
>
> [00:16:37.254] | ^
> [00:16:37.254] In file included from ../../../src/include/c.h:1309,
> [00:16:37.254] from ../../../src/include/postgres_fe.h:25,
> [00:16:37.254] from compress_gzip.c:15:
>
> On MSVC, some declaration conflicts, for a similar issue:
> [00:12:31.966] ../src/bin/pg_dump/compress_io.c(193): error C2371: '_read': redefinition; different basic types
> [00:12:31.966] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(252): note: see declaration of '_read'
> [00:12:31.966] ../src/bin/pg_dump/compress_io.c(210): error C2371: '_write': redefinition; different basic types
> [00:12:31.966] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(294): note: see declaration of '_write'
>

A rename was enough.

> > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
> > doesn't store the passed-in compression_spec.
>

I am afraid I have not been able to reproduce this error. I tried both
debian and freebsd after I addressed the compilation warnings. Which
error did you get? Is it still present in the attached?

> Hmm. This looks like a gap in the existing tests that we'd better fix
> first. This CI is green on Linux.

As the code stands, the compression level is not stored in the custom
format's header as it is no longer relevant information. We can decide
to make it relevant for the tests only on the expense of increasing
dump size by four bytes. In either case this is not applicable in
current head and can wait for 0002's turn.

Cheers,
//Georgios

>
> > 003 still uses <lz4.h> and not "lz4.h".
>
>
> This should be <lz4.h>, not "lz4.h".
>
> --
> Michael

Attachment Content-Type Size
v18-0001-Prepare-pg_dump-internals-for-additional-compres.patch text/x-patch 22.5 KB
v18-0002-Introduce-Compressor-API-in-pg_dump.patch text/x-patch 65.1 KB
v18-0003-Add-LZ4-compression-in-pg_-dump-restore.patch text/x-patch 28.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2022-12-19 17:18:41 Re: meson files copyright
Previous Message Vik Fearing 2022-12-19 16:53:46 Re: GROUP BY ALL