Re: [HACKERS] Custom compression methods

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-01-30 07:16:39
Message-ID: 20210130071638.GG7450@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> + * Helper function for ATExecSetStorage and ATExecSetCompression
> + *
> + * Set the attcompression or attstorage for the respective index attribute. If
> + * attcompression is a valid oid then it will update the attcompression for the
> + * index attribute otherwise it will update the attstorage.
> + */
> +static void
> +ApplyChangesToIndexes(Relation rel, Relation attrelation, AttrNumber attnum,
> + Oid newcompression, char newstorage, LOCKMODE lockmode)

The comment should refer to "newcompression" not attcompression.
Maybe you'd want to assert that exactly one of these is valid.
Or maybe you should allow setting *both* if they're both valid.

> case AMTYPE_TABLE:
> return "TABLE";
>+ case AMTYPE_COMPRESSION:
>+ return "TABLE";

Looks like this part got confused, maybe during rebase.

> + if (strcmp(def->defname, "acceleration") == 0)
> + {
> + int32 acceleration =
> + pg_atoi(defGetString(def), sizeof(acceleration), 0);
> +
> + if (acceleration < INT32_MIN || acceleration > INT32_MAX)

Is this even reachable given that it's int32 type ?

What do you think about this idea:

On Sun, Jan 03, 2021 at 07:22:20PM -0600, Justin Pryzby wrote:
> I think there should also be an option for pg_restore, like --no-tablespaces.
> And I think there should be a GUC for default_compression, like
> default_table_access_method, so one can restore into an alternate compression
> by setting PGOPTIONS=-cdefault_compression=lz4.
>
> I'd like to be able to make all compressible columns of a table use a
> non-default compression (except those which cannot), without having to use
> \gexec... We have tables with up to 1600 columns. So a GUC would allow that.
>
> Previously (on separate threads) I wondered whether pg_dump
> --no-table-access-method was needed - maybe that be sufficient for this case,
> too, but I think it should be possible to separately avoid restoring
> compression AM and AM "proper". So maybe it'd be like --no-tableam=compress
> --no-tableam=storage or --no-tableam-all.

I implemented the GUC. I'm sending a patchset rebased with this as 0002, and
an un-rebased version in case you prefer that.

I think the pg_dump half (--no-acccess-method=compress) is independently useful
for tableams.

> > Your first patch is large due to updating a large number of test cases to
> > include the "compression" column in \d+ output. Maybe that column should be
> > hidden when HIDE_TABLEAM is set by pg_regress ? I think that would allow
> > testing with alternate, default compression.
>
> I am not sure whether we should hide the compression method when
> HIDE_TABLEAM is set. I agree that it is actually an access method but
> it is not the same right? Because we are using it for compression not
> for storing data.

You're right that TABLEAM should only be for table AM's. So I suggest to do
something similar for HIDE_COMPRESSAM. That allows testing with an alternate,
default compression (which is possible with the added GUC).

Language/style fixen to follow:

> + /* acoid should be found in some cases */
amoid

> +-- test alter compression method for the partioned table
partitioned

> + /*
> + * Loop for all the attributes in the tuple and check if any of the
> + * attribute is compressed in the source tuple and its compression method
> + * is not same as the target compression method then we need to decompress
> + * it.

Loop *over*
any attribute is compressed (or any of the attributes *are* compressed)

> + * If we have decompressed any of the field in the source tuple then free
> + * the existing tuple.

any field (or any of the fields)

> + /* Likewise, copy compression if requested */
> + if (table_like_clause->options & CREATE_TABLE_LIKE_COMPRESSION
> + && OidIsValid(attribute->attcompression))

Should say (.. & ..) != 0 && OidIsValid()

> + else if (strcmp(def->compression, compression))

I think postgres likes you to explicitly write != 0

> + n->def = (Node *) makeString($5);;
The 0002 patch has this double semicolon, which is removed by 0003.

> + By default, each column in a partition inherits the compression method
> + from its parent table, however a different compression method can be set
> + for each partition.

I guess it should say that it inherits from the parent table's *column*.

> This clause specifies the type of access method to define.
> - Only <literal>TABLE</literal> and <literal>INDEX</literal>
> + <literal>TABLE</literal>, <literal>INDEX</literal> and <literal>COMPRESSION</literal>
> are supported at present.

I think "ONLY" got lost (?)
Maybe it should say: "Currently, TABLE, INDEX, and COMPRESSION access methods
are supported.

--
Justin

Attachment Content-Type Size
v21-0001-Built-in-compression-method.patch text/x-diff 234.2 KB
v21-0002-Add-default_toast_compression-GUC.patch text/x-diff 8.1 KB
v21-0003-alter-table-set-compression.patch text/x-diff 20.4 KB
v21-0004-Add-support-for-PRESERVE.patch text/x-diff 52.6 KB
v21-0005-Create-custom-compression-methods.patch text/x-diff 33.3 KB
v21-0006-new-compression-method-extension-for-zlib.patch text/x-diff 10.0 KB
v21-0007-Support-compression-methods-options.patch text/x-diff 64.6 KB
0001-Add-default_toast_compression-GUC.beforerebase text/x-diff 8.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-01-30 08:51:03 Re: Allow CURRENT_ROLE in GRANTED BY
Previous Message Amit Kapila 2021-01-30 04:40:58 Re: [PATCH] pg_hba.conf error messages for logical replication connections