Re: zstd compression for pg_dump

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: zstd compression for pg_dump
Date: 2021-01-10 21:06:25
Message-ID: ced272a6-2f02-73cf-b588-03107a257c29@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/4/21 3:53 AM, Justin Pryzby wrote:
>> About 89% smaller.

Did a quick code review of the patch. I have not yet taken it for a spin
yet and there are parts of the code I have not read yet.

## Is there any reason for this diff?

- cfp *fp = pg_malloc(sizeof(cfp));
+ cfp *fp = pg_malloc0(sizeof(cfp));

## Since we know have multiple returns in cfopen() I am not sure that
setting fp to NULL is still clearer than just returning NULL.

## I do not like that this pretends to support r+, w+ and a+ but does
not actually do so since it does not create an input stream in those cases.

else if (mode[0] == 'w' || mode[0] == 'a' ||
strchr(mode, '+') != NULL)
[...]
else if (strchr(mode, 'r'))

## Wouldn't cfread(), cfwrite(), cfgetc(), cfgets(), cfclose() and
cfeof() be cleaner with sitch statments similar to cfopen()?

## "/* Should be called "method" or "library" ? */"

Maybe, but personally I think algorithm is fine too.

## "Is a nondefault level set ?"

The PostgreSQL project does not use space before question mark (at least
not in English).

## Why isn't level_set just a local variable in parse_compression()? It
does not seem to be used elsewhere.

## Shouldn't we call the Compression variable in OpenArchive()
nocompress to match with the naming convention in other places.

And in general I wonder if we should not write "nocompression =
{COMPR_ALG_NONE}" rather than "nocompression = {0}".

## Why not use const on the pointers to Compression for functions like
cfopen()? As far as I can see several of them could be const.

## Shouldn't "AH->compression.alg = Z_DEFAULT_COMPRESSION" in ReadHead()
be "AH->compression.alg = COMPR_ALG_DEFAULT"?

Additionally I am not convinced that returning COMPR_ALG_DEFAULT will
even work but I have not had the time to test that theory yet. And in
general I am quite sceptical of that we really need of COMPR_ALG_DEFAULT.

## Some white space issues

Add spaces around plus in "atoi(1+eq)" and "pg_log_error("unknown
compression algorithm: %s", 1+eq)".

Add spaces around plus in parse_compression(), e.g. in "strlen(1+eq)".

## Shouldn't hasSuffix() take the current compression algorithm as a
parameter? Or alternatively look up which compression algorithm to use
from the suffix?

## Why support multiple ways to write zlib on the command line? I do not
see any advatange of being able to write it as libz.

## I feel renaming SaveOutput() to GetOutput() would make it more clear
what it does now that you have changed the return type.

## You have accidentally committed "-runstatedir" in configure. I have
no idea why we do not have it (maybe it is something Debian specific)
but even if we are going to add it it should not be in this patch. Same
with the parenthesis changes to LARGE_OFF_T.

## This is probably out of scope of your patch but I am not a fan of the
fallback logic in cfopen_read(). I feel ideally we should always know if
there is a suffix or not and not try to guess file names and do
pointless syscalls.

## COMPR_ALG_DEFAULT looks like it would error out for archive and
directory if someone has neither zlib nor zstandard. It feels like it
should default to uncompressed if we have neither. Or at least give a
better error message.

> Note, there's currently several "compression" patches in CF app. This patch
> seems to be independent of the others, but probably shouldn't be totally
> uncoordinated (like adding lz4 in one and ztsd in another might be poor
> execution).

A thought here is that maybe we want to use the same values for the
enums in all patches. Especially if we write the numeric value to pg
dump files.

Andreas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-01-10 21:11:07 Re: Inconsistent "<acronym>" use
Previous Message Tom Lane 2021-01-10 20:58:56 Re: Terminate the idle sessions