Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-08 20:26:04
Message-ID: CA+TgmoZ9sWS_8OHqvTAhmRX-2PoK4UMFiSU4QgKbhAosT1RTzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 8, 2021 at 5:02 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> So now only pending point is, how do we handle the upgrade when you
> are upgrading from --with-lz4 to --without-lz4 binary and a couple of
> options discussed here are
> a) Should we allow table creation with lz4 even if it is compiled
> --without-lz4? In case of xml we always allow table creation even if
> it is compiled --wthout-libxml
> b) Instead of allowing this always, only allow during binary upgrade.

I think the basic answer to (a) that it doesn't matter. Suppose the
user is not upgrading but just feels like creating a table that is
configured to use LZ4 compression. Does it really matter whether they
get the error when they create the table or when they load the data?
Personally, I think it is slightly more user-friendly to give the
error when they try to create the table, because the problem doesn't
occur when inserting ANY row, but only when inserting rows that are
wide enough that compression will occur. It's not that great to have
people create a table and then find out only much later that it
doesn't work. On the other hand, consistency with the way the xml data
type already works seems like a fair enough argument for letting the
error happen when we try to actually use the compression method. So I
can't get worked up about it either way.

Regarding (b), it seems to me that with this approach, we have to
document that pg_upgrade from binaries that support LZ4 to binaries
that don't support LZ4 is fundamentally unsafe. You might have
LZ4-compressed values in your columns even if they are now set to use
PGLZ, and you might have LZ4'd data inside composite values that are
on disk someplace. We have no idea whether those things are true or
not, and we can't prevent you from upgrading to something that makes
part of your data inaccessible. Given that, if we go with this
approach, I think we should expend exactly 0 code trying to making
pg_upgrade pass in cases where there are LZ4 columns in the database
and the new binaries don't support LZ4. Just because the user goes and
gets rid of all the LZ4 columns before upgrading doesn't mean that the
upgrade is safe, but if they haven't even done that much, maybe they
should reconsider things a bit.

Some other review comments:

toast_get_compression_method() should now return char, not Oid.

With this design, we can support changing the compression method on a
column quite easily. It's just a hint, like the STORAGE parameter. It
has no bearing on what can be present in the table, but just controls
how new values are stored. It would be nice to have a way to force
anything compressed with the old method to be re-compressed with the
new method, but not having that doesn't preclude allowing the
parameter to be changed.

I am tempted to propose that we collapse compress_lz4.c and
compress_pglz.c into a single file, get rid of the directory, and just
have something like src/backend/access/common/toast_compression.c. The
files are awfully short, and making a whole new directory for that
small amount of code seems like overkill.

I think the pg_dump argument should be --no-toast-compression, not
--no-toast-compressions. I agree with Justin that pg_restore should
have the option also.

Man, it would be really nice to be able to set the default for new
tables, rather than having all these places hard-coded to use
DefaultCompressionMethod. Surely lotsa people are going to want to set
toast_compression = lz4 in postgresql.conf and forget about it.

Is there any reason not to change varattrib_4b's description of
va_tcinfo that says "and flags" to instead say "and compression
method"? And rename VARFLAGS_4B_C to VARCOMPRESS_4B_C? I don't know
why we should call it flags when we know it's specifically compression
information.

You should probably have a test that involves altering the type of a
varlena column to non-varlena, and the other way around, and make sure
that changing integer -> text sets attcompression and doing the
reverse clears it.

You need to update catalogs.sgml.

On the whole I don't see a whole lot to complain about here. I don't
love giving up on the idea of tracking which compression methods are
used where, but making that work without performance regressions seems
very difficult and perhaps just outright impossible, and dealing with
all the concurrency problems that introduces is a pain, too. I think
accepting a feature that gives us LZ4 compression is better than
rejecting it because we can't solve those problems.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-03-08 20:30:16 Re: pg_amcheck contrib application
Previous Message Tom Lane 2021-03-08 20:12:39 Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]