Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-16 10:37:22
Message-ID: CAFiTN-v1DYnO10LOG2Wse+9WY3JFhzKh_7cJ4kRTqJS8Mqx9Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 16, 2021 at 12:59 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Mar 15, 2021 at 8:14 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > In the attached patches I have changed this, ...
>
> OK, so just looking over this patch series, here's what I think:
>
> Regarding 0003:
>
> The biggest thing that jumps out at me while looking at this with
> fresh eyes is that the patch doesn't touch varatt_external.va_extsize
> at all. In a varatt_external, we can't use the va_rawsize to indicate
> the compression method, because there are no bits free, because the 2
> bits not required to store the size are used to indicate what type of
> varlena we've got. But, that means that the size of a varlena is
> limited to 1GB, so there are 2 bits free in
> varatt_external.va_extsize, just like there are in
> va_compressed.va_rawsize. We could store the same two bits in
> varatt_external.va_extsize that we're storing in
> va_compressed.va_rawsize aka va_tcinfo. That's a big deal, because
> then toast_get_compression_method() doesn't have to call
> toast_fetch_datum_slice() any more, which is a rather large savings.
> If it's only impacting pg_column_compression() then whatever, but
> that's not the case: we've got calls to
> CompareCompressionMethodAndDecompress in places like intorel_receive()
> and ExecModifyTable() that look pretty performance-critical.

Yeah, right we can do this.

> I got thinking about this after looking at ExecFilterJunk(). That
> function is extremely simple precisely because all the work of
> figuring out what should be done has been precomputed. All the smarts
> are in cleanmap[], which is set up before we actually begin execution.
> In a similar way, you can imagine creating some sort of object, let's
> call it a CompressedAttributeFilter, that looks at the tupledesc
> figures out from the tupledesc which columns we need to consider
> recompressing and puts them in an array. Then you have a struct that
> stores a pointer to the array, the number of elements in the array,
> and the value of the last array element. You pass this struct to what
> is now CompareCompressionMethodAndDecompress() and it can now run more
> like what I described above.

Okay, I will work on this, basically, the main idea here is that
instead of identifying which attribute are varlena for every tuple we
already have the tupledesc so basically we can just prepare some sort
of map and in CompareCompressionMethodAndDecompress, only check those
attribute that whether they are
a) non-null b) compressed c) and, compressed with the different
compression method and if all are true then decompress. I agree this
will save the cost of processing the complete tuple descriptor for
every tuple.

> It's possible to imagine doing even better. Imagine that for every
> column we maintain an attcompression value and an
> attpreservecompression value. The former indicates the compression
> type for the column or '\0' if it cannot be compressed, and the latter
> indicates whether any other compression type might be present. Then,
> when we build the CompressedAttributeFilter object, we can exclude
> varlena attributes if attpreservecompression is false and
> attcompression is '\0' or matches the attcompression value for the
> corresponding attribute in the table into which ExecModifyTable() or
> intorel_receive() will be putting the tuple. This seems quite complex
> in terms of bookkeeping, but it would allow us to elide basically all
> of the per-tuple bookkeeping in a lot of common cases, such as UPDATE,
> or an INSERT or CTAS into a table that's using the same compression
> method as the source data.

I haven’t thought about this completely, but maybe it looks quite
complex. I mean maybe during the initplan time we will have to
process the complete tree to fetch the attcompression of the source
attribute because in ExecModifyTable or CTAS the source tuple could be
combinations of Join from multiple tables, function call or from any
source so keeping track of attcompression of each source attribute
seems very difficult. And, moreover, if we are always guaranteeing
that all the data in the table must be compressed as per the attribute
compression then we can not rely on the attcompression of the source
field. I mean we can alter the attribute compression without
rewriting.

> There's another, rather brute-force approach to this problem, too. We
> could just decide that lz4 will only be used for external data, and
> that there's no such thing as an inline-compressed lz4 varlena.
> deotast_fetch_datum() would just notice that the value is lz4'd and
> de-lz4 it before returning it, since a compressed lz4 datum is
> impossible.

This can cause multiple compression/decompression during insert no? I
mean attcompression is lz4 so first we try to compress with lz4
because it might be externalized and later we realize that compression
is reducing the size significantly and don't need to externalize then
should decompress and compress again using pglz? or now don't
compress and just externalize?

> I'm open to being convinced that we don't need to do either of these
> things, and that the cost of iterating over all varlenas in the tuple
> is not so bad as to preclude doing things as you have them here. But,
> I'm afraid it's going to be too expensive.

I think we will have to do something about this, I have tested the
performance of just simple INSERT and CTAS for the table with 300
columns(script attached) and there is a HUGE impact on the performance
so something must be done.

INSERT TIME
Head: 17418.299 ms Patch: 20956.231 ms

CTAS TIME:
Head: 12837.872 ms Patch: 16775.739 ms

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
test1.sql application/sql 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-03-16 10:40:12 Re: Permission failures with WAL files in 13~ on Windows
Previous Message vignesh C 2021-03-16 10:36:03 Re: [HACKERS] logical decoding of two-phase transactions