Re: [HACKERS] Custom compression methods

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-01-04 01:22:20
Message-ID: 20210104012220.GA23940@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The most recent patch doesn't compile --without-lz4:

compress_lz4.c:191:17: error: ‘lz4_cmcheck’ undeclared here (not in a function)
.datum_check = lz4_cmcheck,
...

And fails pg_upgrade check, apparently losing track of the compression (?)

CREATE TABLE public.cmdata2 (
- f1 text COMPRESSION lz4
+ f1 text
);

You added pg_dump --no-compression, but the --help isn't updated. I think
there should also be an option for pg_restore, like --no-tablespaces. And I
think there should be a GUC for default_compression, like
default_table_access_method, so one can restore into an alternate compression
by setting PGOPTIONS=-cdefault_compression=lz4.

I'd like to be able to make all compressible columns of a table use a
non-default compression (except those which cannot), without having to use
\gexec... We have tables with up to 1600 columns. So a GUC would allow that.

Previously (on separate threads) I wondered whether pg_dump
--no-table-access-method was needed - maybe that be sufficient for this case,
too, but I think it should be possible to separately avoid restoring
compression AM and AM "proper". So maybe it'd be like --no-tableam=compress
--no-tableam=storage or --no-tableam-all.

Some language fixes:

Subject: [PATCH v16 1/6] Built-in compression method

+++ b/doc/src/sgml/ddl.sgml
@@ -3762,6 +3762,8 @@ CREATE TABLE measurement (
<productname>PostgreSQL</productname>
tables (or, possibly, foreign tables). It is possible to specify a
tablespace and storage parameters for each partition separately.
+ Partitions inherits the compression method of the parent for each column
+ however we can set different compression method for each partition.

Should say:
+ By default, each column in a partition inherits the compression method from its parent table,
+ however a different compression method can be set for each partition.

+++ b/doc/src/sgml/ref/create_table.sgml

+ <varlistentry>
+ <term><literal>INCLUDING COMPRESSION</literal></term>
+ <listitem>
+ <para>
+ Compression method of the columns will be coppied. The default
+ behavior is to exclude compression method, resulting in the copied
+ column will have the default compression method if the column type is
+ compressible.

Say:
+ Compression method of the columns will be copied. The default
+ behavior is to exclude compression methods, resulting in the
+ columns having the default compression method.

+ <varlistentry>
+ <term><literal>COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal></term>
+ <listitem>
+ <para>
+ This clause adds the compression method to a column. Compression method
+ can be set from the available built-in compression methods. The available
+ options are <literal>pglz</literal> and <literal>lz4</literal>. If the
+ compression method is not sepcified for the compressible type then it will
+ have the default compression method. The default compression method is
+ <literal>pglz</literal>.

Say "The compression method can be set from available compression methods" (or
remove this sentence).
Say "The available BUILT-IN methods are ..."
sepcified => specified

+
+ /*
+ * No point in wasting a palloc cycle if value size is out of the allowed
+ * range for compression

say "outside the allowed range"

+ if (pset.sversion >= 120000 &&
+ if (pset.sversion >= 120000 &&

A couple places that need to say >= 14

Subject: [PATCH v16 2/6] alter table set compression

+ <literal>SET COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal>
+ This clause adds compression to a column. Compression method can be set
+ from available built-in compression methods. The available built-in
+ methods are <literal>pglz</literal> and <literal>lz4</literal>.

Should say "The compression method can be set to any available method. The
built in methods are >PGLZ< or >LZ<"
That fixes grammar, and correction that it's possible to set to an available
method other than what's "built-in".

+++ b/src/include/commands/event_trigger.h
@@ -32,7 +32,7 @@ typedef struct EventTriggerData
#define AT_REWRITE_ALTER_PERSISTENCE 0x01
#define AT_REWRITE_DEFAULT_VAL 0x02
#define AT_REWRITE_COLUMN_REWRITE 0x04
-
+#define AT_REWRITE_ALTER_COMPRESSION 0x08
/*

This is losing a useful newline.

Subject: [PATCH v16 4/6] Create custom compression methods

+ This clause adds compression to a column. Compression method
+ could be created with <xref linkend="sql-create-access-method"/> or it can
+ be set from the available built-in compression methods. The available
+ built-in methods are <literal>pglz</literal> and <literal>lz4</literal>.
+ The PRESERVE list contains list of compression methods used on the column
+ and determines which of them should be kept on the column. Without
+ PRESERVE or if all the previous compression methods are not preserved then
+ the table will be rewritten. If PRESERVE ALL is specified then all the
+ previous methods will be preserved and the table will not be rewritten.
</para>
</listitem>
</varlistentry>

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index f404dd1088..ade3989d75 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -999,11 +999,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
+ could be created with <xref linkend="sql-create-access-method"/> or it can
+ be set from the available built-in compression methods. The available

remove this first "built-in" ?

+ built-in methods are <literal>pglz</literal> and <literal>lz4</literal>.

+GetCompressionAmRoutineByAmId(Oid amoid)
...
+ /* Check if it's an index access method as opposed to some other AM */
+ if (amform->amtype != AMTYPE_COMPRESSION)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("access method \"%s\" is not of type %s",
+ NameStr(amform->amname), "INDEX")));
...
+ errmsg("index access method \"%s\" does not have a handler",

In 3 places, the comment and code should say "COMPRESSION" right ?

Subject: [PATCH v16 6/6] Support compression methods options

+ If compression method has options they could be specified with
+ <literal>WITH</literal> parameter.

If *the* compression method has options, they *can* be specified with *the* ...

@@ -1004,7 +1004,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
+ method is <literal>pglz</literal>. If the compression method has options
+ they could be specified by <literal>WITH</literal>
+ parameter.

same

+static void *
+lz4_cminitstate(List *options)
+{
+ int32 *acceleration = palloc(sizeof(int32));
+
+ /* initialize with the default acceleration */
+ *acceleration = 1;
+
+ if (list_length(options) > 0)
+ {
+ ListCell *lc;
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "acceleration") == 0)
+ *acceleration = pg_atoi(defGetString(def), sizeof(int32), 0);

Don't you need to say "else: error: unknown compression option" ?

+ /*
+ * Compression option must be only valid if we are updating the compression
+ * method.
+ */
+ Assert(DatumGetPointer(acoptions) == NULL || OidIsValid(newcompression));
+

should say "need be valid only if .."

--
Justin

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-01-04 02:53:21 Re: zstd compression for pg_dump
Previous Message Pavel Stehule 2021-01-03 19:41:17 Re: [HACKERS] [PATCH] Generic type subscripting