Re: Add LZ4 compression in pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: gkokolatos(at)pm(dot)me
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-08 19:45:25
Message-ID: 20230108194524.GA27637@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote:
> There's a couple of lz4 bits which shouldn't be present in 002: file
> extension and comments.

There were "LZ4" comments and file extension stuff in the preparatory
commit. But now it seems like you *removed* them in the LZ4 commit
(where it actually belongs) rather than *moving* it from the
prior/parent commit *to* the lz4 commit. I recommend to run something
like "git diff @{1}" whenever doing this kind of patch surgery.

+ if (AH->compression_spec.algorithm != PG_COMPRESSION_NONE &&
+ AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&

This looks wrong/redundant. The gzip part should be removed, right ?

Maybe other places that check if (compression==PG_COMPRESSION_GZIP)
should maybe change to say compression!=NONE?

_PrepParallelRestore() references ".gz", so I think it needs to be
retrofitted to handle .lz4. Ideally, that's built into a struct or list
of file extensions to try. Maybe compression.h should have a function
to return the file extension of a given algorithm. I'm planning to send
a patch for zstd, and hoping its changes will be minimized by these
preparatory commits.

+ errno = errno ? : ENOSPC;

"?:" is a GNU extension (not the ternary operator, but the ternary
operator with only 2 args). It's not in use anywhere else in postgres.
You could instead write it with 3 "errno"s or as "if (errno==0):
errno=ENOSPC"

You wrote "eol_flag == false" and "eol_flag == 0" and true. But it's
cleaner to test it as a boolean: if (eol_flag) / if (!eol_flag).

Both LZ4File_init() and its callers check "inited". Better to do it in
one place than 3. It's a static function, so I think there's no
performance concern.

Gzip_close() still has a useless save_errno (or rebase issue?).

I think it's confusing to have two functions, one named
InitCompressLZ4() and InitCompressorLZ4().

pg_compress_specification is being passed by value, but I think it
should be passed as a pointer, as is done everywhere else.

pg_compress_algorithm is being writen directly into the pg_dump header.
Currently, I think that's not an externally-visible value (it could be
renumbered, theoretically even in a minor release). Maybe there should
be a "private" enum for encoding the pg_dump header, similar to
WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ? Or else a comment there
should warn that the values are encoded in pg_dump, and must never be
changed.

+ Verify that data files where compressed
typo: s/where/were/

Also:
s/occurance/occurrence/
s/begining/beginning/
s/Verfiy/Verify/
s/nessary/necessary/

BTW I noticed that cfdopen() was accidentally committed to compress_io.h
in master without being defined anywhere.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-01-08 19:53:00 Re: Fix gin index cost estimation
Previous Message Vik Fearing 2023-01-08 19:21:03 Re: Todo: Teach planner to evaluate multiple windows in the optimal order