Add LZ4 compression in pg_dump

From: Georgios <gkokolatos(at)protonmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Add LZ4 compression in pg_dump
Date: 2022-02-25 12:05:31
Message-ID: faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss=@protonmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

please find attached a patchset which adds lz4 compression in pg_dump.

The first commit does the heavy lifting required for additional compression methods.
It expands testing coverage for the already supported gzip compression. Commit
bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying
compression related code and allow for the introduction of additional archive
formats. However, pg_backup_archiver.c was not using that API. This commit
teaches pg_backup_archiver.c about cfp and is using it through out.

Furthermore, compression was chosen based on the value of the level passed
as an argument during the invocation of pg_dump or some hardcoded defaults. This
does not scale for more than one compression methods. Now the method used for
compression can be explicitly requested during command invocation, or set during
hardcoded defaults. Then it is stored in the relevant structs and passed in the
relevant functions, along side compression level which has lost it's special
meaning. The method for compression is not yet stored in the actual archive.
This is done in the next commit which does introduce a new method.

The previously named CompressionAlgorithm enum is changed for
CompressionMethod so that it matches better similar variables found through out
the code base.

In a fashion similar to the binary for pg_basebackup, the method for compression
is passed using the already existing -Z/--compress parameter of pg_dump. The
legacy format and behaviour is maintained. Additionally, the user can explicitly
pass a requested method and optionaly the level to be used after a semicolon,e.g. --compress=gzip:6

The second commit adds LZ4 compression in pg_dump and pg_restore.

Within compress_io.{c,h} there are two distinct APIs exposed, the streaming API
and a file API. The first one, is aimed at inlined use cases and thus simple
lz4.h calls can be used directly. The second one is generating output, or is
parsing input, which can be read/generated via the lz4 utility.

In the later case, the API is using an opaque wrapper around a file stream,
which aquired via fopen() or gzopen() respectively. It would then provide
wrappers around fread(), fwrite(), fgets(), fgetc(), feof(), and fclose(); or
their gz equivallents. However the LZ4F api does not provide this functionality.
So this has been implemented localy.

In order to maintain the API compatibility a new structure LZ4File is
introduced. It is responsible for keeping state and any yet unused generated
content. The later is required when the generated decompressed output, exceeds
the caller's buffer capacity.

Custom compressed archives need to now store the compression method in their
header. This requires a bump in the version number. The level of compression is
still stored in the dump, though admittedly is of no apparent use.

The series is authored by me. Rachel Heaton helped out with the expansion
of the testing coverage, testing in different platforms and providing debug information
on those, as well as native speaker wording.

Cheers,
//Georgios

Attachment Content-Type Size
v1-0001-Prepare-pg_dump-for-additional-compression-method.patch text/x-patch 54.5 KB
v1-0002-Add-LZ4-compression-in-pg_-dump-restore.patch text/x-patch 43.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2022-02-25 12:07:46 Re: Typo in pgbench messages.
Previous Message Ajin Cherian 2022-02-25 11:19:02 Re: logical replication empty transactions