Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-01-04 11:27:16
Message-ID: CAFiTN-utJPbTLQ9i10wT_zmHX=un+RQMB1B1xbkTgrh971vqjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> The most recent patch doesn't compile --without-lz4:
>
> compress_lz4.c:191:17: error: ‘lz4_cmcheck’ undeclared here (not in a function)
> .datum_check = lz4_cmcheck,
> ...
>

My bad, fixed this.

> And fails pg_upgrade check, apparently losing track of the compression (?)
>
> CREATE TABLE public.cmdata2 (
> - f1 text COMPRESSION lz4
> + f1 text
> );

I did not get this? pg_upgrade check is passing for me.

> You added pg_dump --no-compression, but the --help isn't updated.

Fixed.

I think
> there should also be an option for pg_restore, like --no-tablespaces. And I
> think there should be a GUC for default_compression, like
> default_table_access_method, so one can restore into an alternate compression
> by setting PGOPTIONS=-cdefault_compression=lz4.
>
> I'd like to be able to make all compressible columns of a table use a
> non-default compression (except those which cannot), without having to use
> \gexec... We have tables with up to 1600 columns. So a GUC would allow that.
>
> Previously (on separate threads) I wondered whether pg_dump
> --no-table-access-method was needed - maybe that be sufficient for this case,
> too, but I think it should be possible to separately avoid restoring
> compression AM and AM "proper". So maybe it'd be like --no-tableam=compress
> --no-tableam=storage or --no-tableam-all.

I will put more thought into this and respond separately.

> Some language fixes:
>
> Subject: [PATCH v16 1/6] Built-in compression method
>
> +++ b/doc/src/sgml/ddl.sgml
> @@ -3762,6 +3762,8 @@ CREATE TABLE measurement (
> <productname>PostgreSQL</productname>
> tables (or, possibly, foreign tables). It is possible to specify a
> tablespace and storage parameters for each partition separately.
> + Partitions inherits the compression method of the parent for each column
> + however we can set different compression method for each partition.
>
> Should say:
> + By default, each column in a partition inherits the compression method from its parent table,
> + however a different compression method can be set for each partition.

Done

> +++ b/doc/src/sgml/ref/create_table.sgml
>
> + <varlistentry>
> + <term><literal>INCLUDING COMPRESSION</literal></term>
> + <listitem>
> + <para>
> + Compression method of the columns will be coppied. The default
> + behavior is to exclude compression method, resulting in the copied
> + column will have the default compression method if the column type is
> + compressible.
>
> Say:
> + Compression method of the columns will be copied. The default
> + behavior is to exclude compression methods, resulting in the
> + columns having the default compression method.

Done

> + <varlistentry>
> + <term><literal>COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal></term>
> + <listitem>
> + <para>
> + This clause adds the compression method to a column. Compression method
> + can be set from the available built-in compression methods. The available
> + options are <literal>pglz</literal> and <literal>lz4</literal>. If the
> + compression method is not sepcified for the compressible type then it will
> + have the default compression method. The default compression method is
> + <literal>pglz</literal>.
>
> Say "The compression method can be set from available compression methods" (or
> remove this sentence).
> Say "The available BUILT-IN methods are ..."
> sepcified => specified

Done

> +
> + /*
> + * No point in wasting a palloc cycle if value size is out of the allowed
> + * range for compression
>
> say "outside the allowed range"
>
> + if (pset.sversion >= 120000 &&
> + if (pset.sversion >= 120000 &&
>
> A couple places that need to say >= 14

Fixed

> Subject: [PATCH v16 2/6] alter table set compression
>
> + <literal>SET COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal>
> + This clause adds compression to a column. Compression method can be set
> + from available built-in compression methods. The available built-in
> + methods are <literal>pglz</literal> and <literal>lz4</literal>.
>
> Should say "The compression method can be set to any available method. The
> built in methods are >PGLZ< or >LZ<"
> That fixes grammar, and correction that it's possible to set to an available
> method other than what's "built-in".

Done

> +++ b/src/include/commands/event_trigger.h
> @@ -32,7 +32,7 @@ typedef struct EventTriggerData
> #define AT_REWRITE_ALTER_PERSISTENCE 0x01
> #define AT_REWRITE_DEFAULT_VAL 0x02
> #define AT_REWRITE_COLUMN_REWRITE 0x04
> -
> +#define AT_REWRITE_ALTER_COMPRESSION 0x08
> /*
>
> This is losing a useful newline.

Fixed

> Subject: [PATCH v16 4/6] Create custom compression methods
>
> + This clause adds compression to a column. Compression method
> + could be created with <xref linkend="sql-create-access-method"/> or it can
> + be set from the available built-in compression methods. The available
> + built-in methods are <literal>pglz</literal> and <literal>lz4</literal>.
> + The PRESERVE list contains list of compression methods used on the column
> + and determines which of them should be kept on the column. Without
> + PRESERVE or if all the previous compression methods are not preserved then
> + the table will be rewritten. If PRESERVE ALL is specified then all the
> + previous methods will be preserved and the table will not be rewritten.
> </para>
> </listitem>
> </varlistentry>
>
> diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
> index f404dd1088..ade3989d75 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> @@ -999,11 +999,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
> + could be created with <xref linkend="sql-create-access-method"/> or it can
> + be set from the available built-in compression methods. The available
>
> remove this first "built-in" ?

Done

> + built-in methods are <literal>pglz</literal> and <literal>lz4</literal>.
>
>
> +GetCompressionAmRoutineByAmId(Oid amoid)
> ...
> + /* Check if it's an index access method as opposed to some other AM */
> + if (amform->amtype != AMTYPE_COMPRESSION)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("access method \"%s\" is not of type %s",
> + NameStr(amform->amname), "INDEX")));
> ...
> + errmsg("index access method \"%s\" does not have a handler",
>
> In 3 places, the comment and code should say "COMPRESSION" right ?

Fixed, along with some other refactoring around this code.

> Subject: [PATCH v16 6/6] Support compression methods options
>
> + If compression method has options they could be specified with
> + <literal>WITH</literal> parameter.
>
> If *the* compression method has options, they *can* be specified with *the* ...

Done

> @@ -1004,7 +1004,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
> + method is <literal>pglz</literal>. If the compression method has options
> + they could be specified by <literal>WITH</literal>
> + parameter.
>
> same

Done

> +static void *
> +lz4_cminitstate(List *options)
> +{
> + int32 *acceleration = palloc(sizeof(int32));
> +
> + /* initialize with the default acceleration */
> + *acceleration = 1;
> +
> + if (list_length(options) > 0)
> + {
> + ListCell *lc;
> +
> + foreach(lc, options)
> + {
> + DefElem *def = (DefElem *) lfirst(lc);
> +
> + if (strcmp(def->defname, "acceleration") == 0)
> + *acceleration = pg_atoi(defGetString(def), sizeof(int32), 0);
>
> Don't you need to say "else: error: unknown compression option" ?

Done

> + /*
> + * Compression option must be only valid if we are updating the compression
> + * method.
> + */
> + Assert(DatumGetPointer(acoptions) == NULL || OidIsValid(newcompression));
> +
>
> should say "need be valid only if .."

Changed.

Apart from this, I have also done some refactoring and comment improvement.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v17-0003-Add-support-for-PRESERVE.patch application/octet-stream 45.9 KB
v17-0001-Built-in-compression-method.patch application/octet-stream 224.9 KB
v17-0002-alter-table-set-compression.patch application/octet-stream 17.8 KB
v17-0004-Create-custom-compression-methods.patch application/octet-stream 31.7 KB
v17-0005-new-compression-method-extension-for-zlib.patch application/octet-stream 10.0 KB
v17-0006-Support-compression-methods-options.patch application/octet-stream 62.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-01-04 11:50:15 Re: Single transaction in the tablesync worker?
Previous Message Hou, Zhijie 2021-01-04 11:16:02 RE: Parallel Inserts in CREATE TABLE AS