Re: [HACKERS] Custom compression methods

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-20 00:06:25
Message-ID: 20210320000625.GP11765@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 19, 2021 at 04:38:03PM -0400, Robert Haas wrote:
> On Fri, Mar 19, 2021 at 4:20 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Nope ... crake's displeased with your assumption that it's OK to
> > clutter dumps with COMPRESSION clauses. As am I: that is going to
> > be utterly fatal for cross-version transportation of dumps.
>
> Regarding your point, that does look like clutter. We don't annotate
> the dump with a storage clause unless it's non-default, so probably we
> should do the same thing here. I think I gave Dilip bad advice here...

On Fri, Mar 19, 2021 at 05:49:37PM -0400, Robert Haas wrote:
> Here's a patch for that. It's a little strange because you're going to
> skip dumping the toast compression based on the default value on the
> source system, but that might not be the default on the system where
> the dump is being restored, so you could fail to recreate the state
> you had. That is avoidable if you understand how things work, but some
> people might not. I don't have a better idea, though, so let me know
> what you think of this.

On Fri, Mar 19, 2021 at 07:22:42PM -0300, Alvaro Herrera wrote:
> Do you mean the column storage strategy, attstorage? I don't think
> that's really related, because the difference there is not a GUC setting
> but a compiled-in default for the type. In the case of compression, I'm
> not sure it makes sense to do it like that, but I can see the clutter
> argument: if we dump compression for all columns, it's going to be super
> noisy.
>
> (At least, for binary upgrade surely you must make sure to apply the
> correct setting regardless of defaults on either system).
>
> Maybe it makes sense to dump the compression clause if it is different
> from pglz, regardless of the default on the source server. Then, if the
> target server has chosen lz4 as default, *all* columns are going to end
> up as lz4, and if it hasn't, then only the ones that were lz4 in the
> source server are going to. That seems reasonable behavior. Also, if
> some columns are lz4 in source, and target does not have lz4, then
> everything is going to work out to not-lz4 with just a bunch of errors
> in the output.

I think what's missing is dumping the GUC value itself, and then also dump any
columns that differ from the GUC's setting. An early version of the GUC patch
actually had an "XXX" comment about pg_dump support, and I was waiting for a
review before polishing it. This was modelled after default_tablespace and
default_table_access_method - I've mentioned that before that there's no
pg_restore --no-table-am, and I have an unpublished patch to add it. That may
be how I missed this until now.

Then, this will output COMPRESSION on "a" (x)or "b" depending on the current
default:
| CREATE TABLE a(a text compression lz4, b text compression pglz);
When we restore it, we set the default before restoring columns.

I think it may be a good idea to document that dumps of columns with
non-default compression aren't portable to older server versions, or servers
--without-lz4. This is a consequence of the CREATE command being a big text
blob, so pg_restore can't reasonably elide the COMPRESSION clause.

While looking at this, I realized that someone added the GUC to
postgresql.conf.sample, but not to doc/ - this was a separate patch until
yesterday.

I think since we're not doing catalog access for "pluggable" compression, this
should just be an enum GUC, with #ifdef LZ4. Then we don't need a hook to
validate it.

ALTER and CREATE are silently accepting bogus compression names.

I can write patches for these later.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-03-20 00:10:54 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Previous Message Hannu Krosing 2021-03-20 00:03:16 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?