Re: Refactoring of compression options in pg_basebackup

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Refactoring of compression options in pg_basebackup
Date: 2022-01-03 15:35:57
Message-ID: YuCGAJCBv6dd06z-v1gysn6F69MBWxspGLLGa8wJBaj8BrYs_rxJT3BJuh1U_LEDghumIN2N9ed-Gfve7K2dBeseCCOsypharQkXknnrxeo=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, December 18th, 2021 at 12:29 PM, Michael Paquier
<michael(at)paquier(dot)xyz> wrote:
>Hi all,
>(Added Georgios in CC.)

thank you for the shout out.

>When working on the support of LZ4 for pg_receivewal, walmethods.c has
>gained an extra parameter for the compression method. It gets used on
>the DirectoryMethod instead of the compression level to decide which
>type of compression is used. One thing that I left out during this
>previous work is that the TarMethod also gained knowledge of this
>compression method, but we still use the compression level to check if
>tars should be compressed or not.
>
>This is wrong on multiple aspects. First, this is not consistent with
>the directory method, making walmethods.c harder to figure out.
>Second, this is not extensible if we want to introduce more
>compression methods in pg_basebackup, like LZ4. This reflects on the
>options used by pg_receivewal and pg_basebackup, that are not
>inconsistent as well.

Agreed with all the above.

>The attached patch refactors the code of pg_basebackup and the
>TarMethod of walmethods.c to use the compression method where it
>should, splitting entirely the logic related the compression level.

Thanks.

>This is one step toward the introduction of LZ4 in pg_basebackup, but
>this refactoring is worth doing on its own, hence a separate thread to
>deal with this problem first. The options of pg_basebackup are
>reworked to be consistent with pg_receivewal, as follows:
>- --compress ranges now from 1 to 9, instead of 0 to 9.
>- --compression-method={none,gzip} is added, the default is none, same
>as HEAD.
>- --gzip/-z has the same meaning as before, being just a synonym of
>--compression-method=gzip with the default compression level of ZLIB
>assigned if there is no --compress.

Indeed this is consistent with pg_receivewal. It gets my +1.

>One more thing that I have noticed while hacking this stuff is that we
>have no regression tests for gzip with pg_basebackup, so I have added
>some that are skipped when not compiling the code with ZLIB.

As far as the code is concerned, I have a minor nitpick.

+ if (compression_method == COMPRESSION_NONE)
+ streamer = bbstreamer_plain_writer_new(archive_filename,
+ archive_file);
#ifdef HAVE_LIBZ
- if (compresslevel != 0)
+ else if (compression_method == COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = bbstreamer_gzip_writer_new(archive_filename,
archive_file,
compresslevel);
}
- else
#endif
- streamer = bbstreamer_plain_writer_new(archive_filename,
- archive_file);
-
+ else
+ {
+ Assert(false); /* not reachable */
+ }

The above block moves the initialization of 'streamer' within two conditional
blocks. Despite this being correct, it is possible that some compilers will
complain for lack of initialization of 'streamer' when it is eventually used a
bit further ahead in:
if (must_parse_archive)
streamer = bbstreamer_tar_archiver_new(streamer);

I propose to initialize streamer to NULL when declared at the top of
CreateBackupStreamer().

As far as the tests are concerned, I think that 2 too many tests are skipped
when HAVE_LIBZ is not defined to be 1. The patch reads:

+Check ZLIB compression if available.
+SKIP:
+{
+ skip "postgres was not built with ZLIB support", 5
+ if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+ $node->command_ok(
+ [
+ 'pg_basebackup', '-D',
+ "$tempdir/backup_gzip", '--compression-method',
+ 'gzip', '--compress', '1', '--no-sync', '--format', 't'
+ ],
+ 'pg_basebackup with --compress and --compression-method=gzip');
+
+ # Verify that the stored files are generated with their expected
+ # names.
+ my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+ is(scalar(@zlib_files), 2,
+ "two files created with gzip (base.tar.gz and pg_wal.tar.gz)");
+
+ # Check the integrity of the files generated.
+ my $gzip = $ENV{GZIP_PROGRAM};
+ skip "program gzip is not found in your system", 1
+ if ( !defined $gzip
+ || $gzip eq ''
+ || system_log($gzip, '--version') != 0);
+
+ my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
+ is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+ rmtree("$tempdir/backup_gzip");
+}

You can see that after the check_pg_config() test, only 3 tests follow,
namely:
* $node->command_ok()
* is(scalar(), ...)
* is($gzip_is_valid, ...)

>Opinions?

Other than the minor issues above, I think this is a solid improvement. +1

>--
>Michael

Cheers,
//Georgios

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gunnar "Nick" Bluth 2022-01-03 15:52:03 Re: [PATCH] pg_stat_toast v0.4
Previous Message Nikhil Benesch 2022-01-03 15:26:08 Re: Remove inconsistent quotes from date_part error