Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, shiy(dot)fnst(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-02-27 14:33:04
Message-ID: qSzzmYsXqUf1gDvFspOEbqoE8U_F8IsphDG8c25VZ5KTXimQVYfLvnCcuU7_X2raa2SzZinDF0NpDIKSOALRUYP8pu_NL5QJwIDdR_RT-4U=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Sunday, February 26th, 2023 at 3:59 PM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

>
>
>
>
> On 2/25/23 06:02, Justin Pryzby wrote:
>
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> >
> > - I'm unclear about get_error_func(). That's called in three places
> > from pg_backup_directory.c, after failures from write_func(), to
> > supply an compression-specific error message to pg_fatal(). But it's
> > not being used outside of directory format, nor for errors for other
> > function pointers, or even for all errors in write_func(). Is there
> > some reason why each compression method's write_func() shouldn't call
> > pg_fatal() directly, with its compression-specific message ?
>
>
> I think there are a couple more places that might/should call
> get_error_func(). For example ahwrite() in pg_backup_archiver.c now
> simply does
>
> if (bytes_written != size * nmemb)
> WRITE_ERROR_EXIT;
>
> but perhaps it should call get_error_func() too. There are probably
> other places that call write_func() and should use get_error_func().

Agreed, calling get_error_func() would be preferable to a fatal error. It
should be the caller of the api who decides how to proceed.

>
> > - I still think supports_compression() should be renamed, or made into a
> > static function in the necessary file. The main reason is that it's
> > more clear what it indicates - whether compression is "implemented by
> > pgdump" and not whether compression is "supported by this postgres
> > build". It also seems possible that we'd want to add a function
> > called something like supports_compression(), indicating whether the
> > algorithm is supported by the current build. It'd be better if pgdump
> > didn't subjugate that name.
>
>
> If we choose to rename this to have pgdump_ prefix, fine with me. But I
> don't think there's a realistic chance of conflict, as it's restricted
> to pgdump header etc. And it's not part of an API, so I guess we could
> rename that in the future if needed.

Agreed, it is very unrealistic that one will include that header file anywhere
but within pg_dump. Also. I think that adding a prefix, "pgdump", "pg_dump",
or similar does not add value and subtracts readability.

>
> > - Finally, the "Nothing to do in the default case" comment comes from
> > Michael's commit 5e73a6048:
> >
> > + /*
> > + * Custom and directory formats are compressed by default with gzip when
> > + * available, not the others.
> > + /
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > {
> > #ifdef HAVE_LIBZ
> > - if (archiveFormat == archCustom || archiveFormat == archDirectory)
> > - compressLevel = Z_DEFAULT_COMPRESSION;
> > - else
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + &compression_spec);
> > +#else
> > + / Nothing to do in the default case */
> > #endif
> > - compressLevel = 0;
> > }
> >
> > As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> > enabled, and when not otherwise specified by the user.
> >
> > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when
> > zlib was unavailable.
> >
> > But I'm not sure why there's now an empty "#else". I also don't know
> > what "the default case" refers to.
> >
> > Maybe the best thing here is to move the preprocessor #if, since it's no
> > longer in the middle of a runtime conditional:
> >
> > #ifdef HAVE_LIBZ
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + &compression_spec);
> > #endif
> >
> > ...but that elicits a warning about "variable set but not used"...
>
>
> Not sure, I need to think about this a bit.

Not having warnings is preferable, isn't it? I can understand the confusion
on the message though. Maybe a phrasing like:
/* Nothing to do for the default case when LIBZ is not available */
is easier to understand.

Cheers,
//Georgios

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-02-27 14:33:42 Re: libpq: PQgetCopyData() and allocation overhead
Previous Message Melanie Plageman 2023-02-27 14:24:01 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)