Re: adding 'zstd' as a compression algorithm

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: adding 'zstd' as a compression algorithm
Date: 2022-02-15 22:10:47
Message-ID: CAH2-WzmsBX_7fdSMmuPDAZg9v+4bfbdgBOVeM4LJcx2KmbtPQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 15, 2022 at 12:00 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm not sure I completely follow this. There are cases where we use
> compression algorithms for internal purposes, and we can change the
> defaults without users knowing or caring. For example, we could decide
> that LZ4-compressing FPIs in WAL records is a sensible default and
> just start doing it, and users need not care. But backups are
> different, because when you pg_basebackup -Ft, you get .tar or .tar.gz
> or .tar.lz4 files which we don't give you tools to extract.

What I meant is that you're buying into an ecosystem by choosing to
use a tool like pgBackrest. That might not be the right thing to do
universally, but it comes pretty close. You as a user are largely
deferring to the maintainers and their choice of defaults. Not
entirely, of course, but to a significant degree, especially in
matters like this. There aren't that many dimensions to the problem
once the question of compatibility is settled (it's settled here
because you've already bought into a tool that requires the library as
standard, say).

A lot of things are *not* like that, but ISTM that backup compression
really is -- we have the luxury of a constrained problem space, for
once. There aren't that many metrics to consider, because it must be
lossless compression in any case, and because the requirements are
relatively homogenous. The truly important thing (once compatibility
is accounted for) is to use something basically reasonable, with no
noticeable weaknesses relative to any of the alternatives.

> I pretty much agree with all of that. Nevertheless, there's a lot of
> pglz-compressed data that's already on a disk someplace which people
> are not necessarily keen to rewrite, and that will probably remain
> true for the foreseable future. If we change the default to something
> that's not pglz, and then wait 10 years, we MIGHT be able to get rid
> of it without pushback. But I wouldn't be surprised to learn that even
> then there are a lot of people who have just pg_upgrade'd the same
> database over and over again. And honestly I think that's fine.

I think that it's fine too. It's unlikely that anybody is going to go
to any trouble to get better compression, certainly, but we should
give them every opportunity to use better alternatives. Ideally
without anybody having to even think about it.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2022-02-15 22:23:44 Re: Observability in Postgres
Previous Message Andres Freund 2022-02-15 21:55:12 Re: fixing bookindex.html bloat