Re: Re: [HACKERS] Custom compression methods

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: [HACKERS] Custom compression methods
Date: 2020-08-31 05:14:25
Message-ID: CAJ3gD9c370yZFgHRAv+Up987o=t94sr22R36T=BKX5=_fw=TzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 13 Aug 2020 at 17:18, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I have rebased the patch on the latest head and currently, broken into 3 parts.
>
> v1-0001: As suggested by Robert, it provides the syntax support for
> setting the compression method for a column while creating a table and
> adding columns. However, we don't support changing the compression
> method for the existing column. As part of this patch, there is only
> one built-in compression method that can be set (pglz). In this, we
> have one in-build am (pglz) and the compressed attributes will directly
> store the oid of the AM. In this patch, I have removed the
> pg_attr_compresion as we don't support changing the compression
> for the existing column so we don't need to preserve the old
> compressions.
> v1-0002: Add another built-in compression method (zlib)
> v1:0003: Remaining patch set (nothing is changed except rebase on the
> current head, stabilizing check-world and 0001 and 0002 are pulled
> out of this)
>
> Next, I will be working on separating out the remaining patches as per
> the suggestion by Robert.

Thanks for this new feature. Looks promising and very useful, with so
many good compression libraries already available.

I see that with the patch-set, I would be able to create an extension
that defines a PostgreSQL C handler function which assigns all the
required hook function implementations for compressing, decompressing
and validating, etc. In short, I would be able to use a completely
different compression algorithm to compress toast data if I write such
an extension. Correct me if I am wrong with my interpretation.

Just a quick superficial set of review comments ....

A minor re-base is required due to a conflict in a regression test

-------------

In heap_toast_insert_or_update() and in other places, the comments for
new parameter preserved_am_info are missing.

-------------

+toast_compress_datum(Datum value, Oid acoid)
{
struct varlena *tmp = NULL;
int32 valsize;
- CompressionAmOptions cmoptions;
+ CompressionAmOptions *cmoptions = NULL;

I think tmp and cmoptions need not be initialized to NULL

-------------

- TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
- SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
/* successful compression */
+ toast_set_compressed_datum_info(tmp, amoid, valsize);
return PointerGetDatum(tmp);

Any particular reason why is this code put in a new extern function ?
Is there a plan to re-use it ? Otherwise, it's not necessary to do
this.

------------

Also, not sure why "HTAB *amoptions_cache" and "MemoryContext
amoptions_cache_mcxt" aren't static declarations. They are being used
only in toast_internals.c

-----------

The tab-completion doesn't show COMPRESSION :
postgres=# create access method my_method TYPE
INDEX TABLE
postgres=# create access method my_method TYPE

Also, the below syntax also would better be tab-completed so as to
display all the installed compression methods, in line with how we
show all the storage methods like plain,extended,etc:
postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION

------------

I could see the differences in compression ratio, and the compression
and decompression speed when I use lz versus zib :

CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4'));
create table lztab(t text);
ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz;

pgg:s2:pg$ time psql -c "\copy zlibtab from text.data"
COPY 13050

real 0m1.344s
user 0m0.031s
sys 0m0.026s

pgg:s2:pg$ time psql -c "\copy lztab from text.data"
COPY 13050

real 0m2.088s
user 0m0.008s
sys 0m0.050s

pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass),
pg_table_size('lztab'::regclass)"
pg_table_size | pg_table_size
---------------+---------------
1261568 | 1687552

pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like '0000'"
> /dev/null

real 0m0.127s
user 0m0.000s
sys 0m0.002s

pgg:s2:pg$ time psql -c "select NULL from lztab where t like '0000'"
> /dev/null

real 0m0.050s
user 0m0.002s
sys 0m0.000s

--
Thanks,
-Amit Khandekar
Huawei Technologies

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-08-31 05:18:48 Re: list of extended statistics on psql
Previous Message Rahila 2020-08-31 05:06:13 Re: More tests with USING INDEX replident and dropped indexes