Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-05 14:43:58
Message-ID: CAFiTN-vZHv=MKADfi75cgvS48889n+K-fBP0i2HCDPa1bwfu6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 2, 2021 at 2:45 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Some more review comments:
>
> 'git am' barfs on v0001 because it's got a whitespace error.

Fixed

> VARFLAGS_4B_C() doesn't seem to be used in any of the patches. I'm OK
> with keeping it even if it's not used just because maybe someone will
> need it later but, uh, don't we need to use it someplace?

Actually I was using TOAST_COMPRESS_METHOD and that required inclusion
of toast_internal.h so now
I have used VARFLAGS_4B_C and with that we are able to remove the
inclusion of toast_internal.h
in unwanted places.

> To avoid moving the goalposts for a basic install, I suggest that
> --with-lz4 should default to disabled. Maybe we'll want to rethink
> that at some point, but since we're just getting started with this
> whole thing, I don't think now is the time.

Done

> The change to ddl.sgml doesn't seem to make sense to me. There might
> be someplace where we want to explain how properties are inherited in
> partitioning hierarchies, but I don't think this is the right place,
> and I don't think this explanation is particularly clear.

Not yet done, I thought at the same place we are describing the
storage relationship with the partition so that is the place for the
compression also. Maybe I will have to read ddl.sgml file and find
out the most suitable place. So I kept is as pending.

> + This clause adds the compression method to a column. The Compression
> + method can be set from available compression methods. The built-in
> + methods are <literal>pglz</literal> and <literal>lz4</literal>.
> + If no compression method is specified, then compressible types will have
> + the default compression method <literal>pglz</literal>.
>
> Suggest: This sets the compression method for a column. The supported
> compression methods are <literal>pglz</literal> and
> <literal>lz4</literal>. <literal>lz4</literal> is available only if
> <literal>--with-lz4</literal> was used when building
> <productname>PostgreSQL</productname>. The default is
> <literal>pglz</literal>.

Done

> We should make sure, if you haven't already, that trying to create a
> column with LZ4 compression fails at table creation time if the build
> does not support LZ4. But, someone could also create a table using a
> build that has LZ4 support and then switch to a different set of
> binaries that do not have it, so we need the runtime checks also.
> However, those runtime checks shouldn't fail simplify from trying to
> access a table that is set to use LZ4 compression; they should only
> fail if we actually need to decompress an LZ4'd value.

Done, I have cheched the compression method Oid if it LZ4 and if the
lz4 library is not install
then error out. We can also use the handler sepcific check function
but I am not sure does that make
sense to add extra routine for that. In later patch 0006 we have an
check function to verify the
option so during that we can error out and no need to check this outside.

> Since indexes don't have TOAST tables, it surprises me that
> brin_form_tuple() thinks it can TOAST anything. But I guess that's not
> this patch's problem, if it's a problem at all.

it is just trying to compress it not externalize.

> I like the fact that you changed the message "compressed data is
> corrupt" to indicate the compression method, but I think the resulting
> message doesn't follow style guidelines because I don't believe we
> normally put something with a colon prefix at the beginning of a
> primary error message. So instead of saying "pglz: compressed data is
> corrupt" I think you should say something like "compressed pglz data
> is corrupt". Also, I suggest that we take this opportunity to switch
> to ereport() rather than elog() and set
> errcode(ERRCODE_DATA_CORRUPTED).

Done

>
> What testing have you done for performance impacts? Does the patch
> slow things down noticeably with pglz? (Hopefully not.) Can you
> measure a performance improvement with pglz? (Hopefully so.) Is it
> likely to hurt performance that there's no minimum size for lz4
> compression as we have for pglz? Seems like that could result in a lot
> of wasted cycles trying to compress short strings.

Not sure what to do about this, I will check the performance with
small varlenas and see.

> pglz_cmcompress() cancels compression if the resulting value would be
> larger than the original one, but it looks like lz4_cmcompress() will
> just store the enlarged value. That seems bad.

you mean lz4_cmcompress, Done

> pglz_cmcompress() doesn't need to pfree(tmp) before elog(ERROR).

Done

> CompressionOidToId(), CompressionIdToOid() and maybe other places need
> to remember the message style guidelines. Primary error messages are
> not capitalized.

Fixed

> Why should we now have to include toast_internals.h in
> reorderbuffer.c, which has no other changes? That definitely shouldn't
> be necessary. If something in another header file now requires
> something from toast_internals.h, then that header file would be
> obliged to include toast_internals.h itself. But actually that
> shouldn't happen, because the whole point of toast_internals.h is that
> it should not be included in very many places at all. If we're adding
> stuff there that is going to be broadly needed, we're adding it in the
> wrong place.

Done

> varlena.c shouldn't need toast_internals.h either, and if it did, it
> should be in alphabetical order.
>

It was the wrong usage, fixed now.

Please refer to the latest patch at

https://www.postgresql.org/message-id/CAFiTN-v9Cs1MORnp-3bGZ5QBwr5v3VarSvfaDizHi1acXES5xQ%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-02-05 14:46:32 Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Previous Message Dilip Kumar 2021-02-05 14:41:41 Re: [HACKERS] Custom compression methods