Re: [HACKERS] Custom compression methods

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-08 17:29:16
Message-ID: 20210308172915.GF29832@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 07, 2021 at 06:04:41PM +0530, Dilip Kumar wrote:
> On Sun, Mar 7, 2021 at 2:19 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > Earlier in this thread, I suggested to implement an option to pg_restore to
> > avoid outputting compression, in order to allow restoring with a different
> > compression (by using the default_toast_compression GUC). Now, it seems like
> > that's even more important, to allow restoring into binaries --without-lz4.
> > (the pg_dump isn't in LZ4 format, it just needs to not say "COMPRESSION LZ4").
>
> IMHO, we have an option with pg_dump that should be sufficient, no?

I think it's insufficient since people may be unable to restore from backup, or
can only restore backup by resorting to this:
pg_restore -f- |sed 's/COMPRESSION lz4//' |psql -d mydb

I think there's a parallel with --no-tablespaces. But if a tablespace is
missing/renamed, the table is still restored (with errors during SET
default_tablespace), whereas if lz4 is missing, the table is not restored.
Or actually the table would be created, but the COPY/INSERTs would fail.

There's an argument to be made that this is already an issue - for example,
I've numerous times done a partial restore of a single partition, where the
column types have changed in the parent, and I need to use sed to restore the
partition. However, that's improving - in v14: "attach table" is a separate
pg_dump object, so the table *is* restored, and only the ATTACH command fails.
(See 9a4c0e36f).

I wonder if COMPRESSION should be dumped as ALTER statements, not in the
CREATE. In fact, the CREATE syntax is optional and could be removed. Similar
to ALTER TABLE t ALTER c SET STATISTICS 99 - there's no CREATE grammar for
that.

Note that I think that using ALTER doesn't resolves this issue, since the
createStmt is sent using the simple query protocol (PQexec), which means that
all its commands are executed as a single transaction, and if the ALTER to LZ4
fails, so does the preceding CREATE. This is the same issue I see with
"CREATE..ATTACH PARTITION", above.

> but I agree that having such an option with restore will give more
> flexibility basically, by using the same dump we can restore to binary
> --with-lz4 as well as without-lz4 if such option exists with restore
> as well. But it seems in pg_restore we process token by token so if
> we want to implement such an option then I think we will have to parse
> the complete string of CREATE TABLE command and remove the compression
> option if it exists for any attribute. I am not sure whether providing
> this option is worth the complexity?

Oh...I realize now that this is different from the "tablespace" case in that
the column compression is not stored separately in the dump. And because it
exists for each column, not for the whole table. I suppose one answer to that
would be to make compression a per-table reloption, rather than a per-attribute
option. (I can anticipate that you'll hate this idea.)

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-08 17:30:52 Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Previous Message Ibrar Ahmed 2021-03-08 17:24:03 Re: TRUNCATE on foreign table