Small issues with CREATE TABLE COMPRESSION

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Jacob Champion <pchampion(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Small issues with CREATE TABLE COMPRESSION
Date: 2021-04-27 06:22:25
Message-ID: YIetoZGq31L84v5d@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I have been looking at and testing the patch set for CREATE TABLE
COMPRESSION, and spotted a couple of things in parallel of some work
done by Jacob (added in CC).

The behavior around CREATE TABLE AS and matviews is a bit confusing,
and not documented. First, at the grammar level, it is not possible
to specify which compression option is used per column when creating
the relation. So, all the relation columns would just set a column's
compression to be default_toast_compression for all the toastable
columns of the relation. That's not enforceable at column level when
the relation is created, except with a follow-up ALTER TABLE. That's
similar to STORAGE when it comes to matviews, but these are at least
documented.

And so, ALTER MATERIALIZED VIEW supports SET COMPRESSION but this is
not mentioned in its docs:
https://www.postgresql.org/docs/devel/sql-altermaterializedview.html
psql could have tab completion support for that.

There are no tests in pg_dump to make sure that some ALTER
MATERIALIZED VIEW or ALTER TABLE commands are generated when the
compression of a matview's or table's column is changed. This depends
on the value of default_toast_compression, but that would be nice to
have something, and get at least some coverage with
--no-toast-compression. You would need to make the tests conditional
here, for example with check_pg_config() (see for example what's done
with channel binding in ssl/t/002_scram.pl).

Another thing is the handling of the per-value compression that could
be confusing to the user. As no materialization of the data is done
for a CTAS or a matview, and the regression tests of compression.sql
track that AFAIK, there can be a mix of toast values compressed with
lz4 or pglz, with pg_attribute.attcompression being one or the other.

Now, we don't really document any of that, and the per-column
compression value would be set to default_toast_compression while the
stored values may use a mix of the compression methods, depending on
where the toasted values come from. If this behavior is intended, this
makes me wonder in what the possibility to set the compression for a
materialized view column is useful for except for a logical
dump/restore? As of HEAD we'd just insert the toasted value from the
origin as-is so the compression of the column has no effect at all.
Another thing here is the inconsistency that this brings with pg_dump.
For example, as the dumped values are decompressed, we could have
values compressed with pglz at the origin, with a column using lz4
within its definition that would make everything compressed with lz4
once the values are restored. This choice may be fine, but I think
that it would be good to document all that. That would be less
surprising to the user.

Similarly, we may want to document that COMPRESSION does not trigger a
table rewrite, but that it is effective only for the new toast values
inserted if a tuple is rebuilt and rewritten?

Would it be better to document that pg_column_compression() returns
NULL if the column is not a toastable type or if the column's value is
not compressed?

The flexibility with allow_system_table_mods allows one to change the
compression method of catalogs, for example switching rolpassword with
a SCRAM verifier large enough to be toasted would lock an access to
the cluster if restarting the server without lz4 support. I shouldn't
have done that but I did, and I like it :)

The design used by this feature is pretty cool, as long as you don't
read the compressed values, physical replication can work out of the
box even across nodes that are built with or without lz4.

Thanks,
--
Michael

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-04-27 06:25:04 Re: compute_query_id and pg_stat_statements
Previous Message Dilip Kumar 2021-04-27 06:20:01 Re: [BUG] "FailedAssertion" reported when streaming in logical replication