Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-07 06:46:41
Message-ID: CAFiTN-sB2oamJ6NoDW6+L6Jf7MgMMUmT2s0ACGEito+-=Hw_yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 7, 2021 at 1:27 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Sat, Mar 06, 2021 at 08:59:16PM +0530, Dilip Kumar wrote:
> > - Alter table set compression, will not rewrite the old data, so only
> > the new tuple will be compressed with the new compression method.
> > - No preserve.
>
> +1, this simplifies things. If someone *wants* to rewrite the table, they can
> VACUUM FULL, CLUSTER, or dump+restore.

Thanks

> I checked that it's possible to do simple column manipulations on columns
> written with-lz4 with binaries built without-lz4:
> - is null
> - texteq if length differs
> - explain analyze

That's because unless you really need to compress/decompress the data
it won't error out. IMHO this behavior is fine as this is the
behavior with other libraries also e.g libxml

> If I pg_upgrade from an binary with-lz4 to one without-lz4, it fails
> while restoring the schema, after running check, which is bad:
> | pg_restore: error: could not execute query: ERROR: not built with lz4 support
> |CREATE TABLE "public"."a" (
> | "t" "text" COMPRESSION lz4,
>
> For comparison, upgrading from binaries with-libxml to binaries without-libxml
> actualy passes pg_upgrade.
>
> It's arguable which behavior is desirable:
> - allow CREATE TABLE(..COMPRESSION lz4) during pg_upgrade;
> - allow CREATE TABLE(..COMPRESSION lz4) always. This has the advantage that
> GetAttributeCompression() doesn't have conditional compilation. This seems
> to be parallel to the libxml case - apparently, it's possible to create an
> XML column, but not insert into it.

IMHO we can always allow creating the table with lz4 and only error
out when we really need to compress/decompress the data. I like this
behavior because it is the same as libxml. But I am fine with
allowing it only in binary upgrade also. Another option could be to
fall back to default "pglz" in binary upgrade mode if it is built
without-lz4 but the problem is this will change the table
specification after the upgrade. So maybe we can go with either of
the first two options. Any other thoughts on this?

> - abort pg_upgrade --check if the old cluster has lz4 and the new one doesn't,
> if there are any lz4 compressed columns. This avoids the possibilty of
> running an upgrade to binaries without lz4, starting a new cluster (which
> leaves the old cluster unsafe to start if --link was used), and then the new
> cluster may even appear to work, until an LZ4 column is accessed in a
> nontrivial way. It has the disadvantage that there's no obvious parallel in
> pg_upgrade (checksums and xml are the closest?). And the disadvantage that
> some people might *want* the upgrade to succeed in that case to then recompile
> with lz4 afterwards.

Yeah.

> In this patch, SET default_toast_compression=lz4 "works" even if without-lz4,
> but then CREATE TABLE fails. You should either allow table creation (as
> above), or check in check_default_toast_compression() if lz4 is enabled.
> Its comment about "catalog access" is incorrect now.

I will fix the comment, I agree that only if we allow to always
create a table then only it makes sense to set the default as lz4 if
it is compiled without lz4.

> Now, I wonder if default_toast_compression should be a GUC, or a reloption.
> An obvious advantage of being a GUC is that regression tests are trivial with
> make installcheck.

I don't think it makes much sense to give a table-wise option, we
anyways have the option to give a compression method per attribute and
I think selecting the compression method more depends upon the
attribute data and type. So I think providing the GUC makes more
sense when the user wants to select some default compression method
for most of its attributes.

> Some minor fixes:
>
> + if (strcmp(def->compression, newdef->compression))
> != 0
>
> + * NULL for non varlena type or the uncompressed data.
> remove "the"
>
> + * InvalidOid for the plain/external storage otherwise default
> remove "the"
>
> + behavior is to exclude compression methods, resulting in the columns
> remove "the"
>
>
> + attcompression and attstorage for the respective index attribute if ... the respective input values are
> say "and/or attstorage"
>
> + If this variable is set to <literal>true</literal>, column's [...]
> I wrote this, but I guess it should say: columns'
>
> I think you could also say either of these:
> .. column compression method details are not displayed.
> .. details of column compression are not displayed.
>

Thanks, I will fix these in the next version.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-03-07 07:17:03 Re: [HACKERS] Custom compression methods
Previous Message Japin Li 2021-03-07 06:43:22 Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW