adding 'zstd' as a compression algorithm

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: adding 'zstd' as a compression algorithm
Date: 2022-02-15 18:20:32
Message-ID: CA+TgmoatQKGd+8SjcV+bzvw4XaoEwminHjU83yG12+NXtQzTTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Over on the rather-long "refactoring basebackup.c" thread, there is a
proposal, which I endorse, to add base-backup compression via zstd. To
do that, we'd need to patch configure.ac to create a new --with-zstd
flag and appropriate supporting infrastructure. My colleague Jeevan
Ladhe included that in a patch posted over there; I've extracted just
the part adding configure support for libzstd and attach it here. I
thought it would be good to have a new thread specifically devoted to
the topic of whether zstd is a thing that PostgreSQL ought to support
in general.

In general, deciding on new compression algorithms can feel a bit like
debating the merits of vi vs. emacs, or one political party vs.
another. Everyone has their own favorites, for reasons that can often
seem idiosyncratic. One advantage of zstd is that it is already being
used by other prominent open-source projects, including the Linux
kernel.[1] This means that it is unlikely to just dry up and vanish,
and it also reduces the risk of legal issues. On a technical level,
zstd offers compression ratios similar to or better than gzip, but
with much faster compression speed. Furthermore, the zstd library has
built-in multi-threaded compression which we may be able to leverage
for even better performance. In fact, parallel zstd might be able to
compress faster than lz4, which is already extremely fast.

What I imagine if this patch is accepted is that we (or our users)
will end up using lz4 for places where compression needs to be very
lightweight, and zstd for places where it's acceptable or even
desirable to spend more CPU cycles in exchange for better compression.
I think that gzip and pglz are really only of historical interest -
and I don't say that to mean that we shouldn't continue to support
them or that they won't get use. Lots of people are perfectly happy
with TOAST compression using pglz, and I'm perfectly happy if they
continue to do that forever, even though I'm glad LZ4 is now an
option. Likewise, I still download the .tar.gz version of anything
that gives me that option, basically because I'm familiar with the
format and it's easy for me to just carry on using it -- and in a
similar way I expect a lot of people will be happy to continue to
compress backups with gzip for many years to come. But I think there
is value in supporting newer and better technology, too. I realize
that we don't want to support every new and shiny thing that shows up,
but I don't think that's what I am proposing here.

Anyway, those are my thoughts. What are yours?

Thanks,

[1] https://en.wikipedia.org/wiki/Zstd#Usage

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Add-support-for-building-with-ZSTD.patch application/octet-stream 13.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-15 18:26:41 Re: Design of pg_stat_subscription_workers vs pgstats
Previous Message Andres Freund 2022-02-15 18:17:42 Re: Design of pg_stat_subscription_workers vs pgstats