Re: [HACKERS] Custom compression methods

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-15 19:29:05
Message-ID: CA+TgmobRynuR8n18ZyUD4pA-T3tnPk+AME9=bg=jT4s_Uxk22Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 15, 2021 at 8:14 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> In the attached patches I have changed this, ...

OK, so just looking over this patch series, here's what I think:

- 0001 and 0002 are now somewhat independent of the rest of this work,
and could be dropped, but I think they're a good idea, so I'd like to
commit them. I went over 0001 carefully this morning and didn't find
any problems. I still need to do some more review of 0002.
- 0003 through 0005 are the core of this patch set. I'd like to get
them into this release, but I think we're likely to run out of time.
- I don't think we want to proceed with 0006 at this time. It needs
broader buy-in, I think, and I think it also needs some other
improvements, at the least to the comments.
- 0007 is not intended for commit, but just exists to fool the
CommitFest bot into testing the feature.

Regarding 0003:

The biggest thing that jumps out at me while looking at this with
fresh eyes is that the patch doesn't touch varatt_external.va_extsize
at all. In a varatt_external, we can't use the va_rawsize to indicate
the compression method, because there are no bits free, because the 2
bits not required to store the size are used to indicate what type of
varlena we've got. But, that means that the size of a varlena is
limited to 1GB, so there are 2 bits free in
varatt_external.va_extsize, just like there are in
va_compressed.va_rawsize. We could store the same two bits in
varatt_external.va_extsize that we're storing in
va_compressed.va_rawsize aka va_tcinfo. That's a big deal, because
then toast_get_compression_method() doesn't have to call
toast_fetch_datum_slice() any more, which is a rather large savings.
If it's only impacting pg_column_compression() then whatever, but
that's not the case: we've got calls to
CompareCompressionMethodAndDecompress in places like intorel_receive()
and ExecModifyTable() that look pretty performance-critical.

I think that CompareCompressionMethodAndDecompress() could be
redesigned to be more efficient by moving more of the per-tuple work
into a separate setup phase. Consider a case where the tuple has 300
columns. 299 of them are fixed-with, but column 100 is a varlena. In
an ideal world, we would do slot_getsomeattrs(slot, 100). Then we
would check whether column is not null, whether it is compressed, and
whether the compression method is the one we want. If recompression is
required, then we must slot_getallattrs(slot), memcpy all the values
to the virtual slot created for this purpose, and decompress and
recompress column 100. But, if column 100 is not null, then we need
not ever deform beyond column 100, and in no case do we need to
iterate over all 300 attributes. But the current code will do just
that. For every new tuple, it loops again over every attribute and
re-discovers which ones are varlenas. That's kinda the pits.

I got thinking about this after looking at ExecFilterJunk(). That
function is extremely simple precisely because all the work of
figuring out what should be done has been precomputed. All the smarts
are in cleanmap[], which is set up before we actually begin execution.
In a similar way, you can imagine creating some sort of object, let's
call it a CompressedAttributeFilter, that looks at the tupledesc
figures out from the tupledesc which columns we need to consider
recompressing and puts them in an array. Then you have a struct that
stores a pointer to the array, the number of elements in the array,
and the value of the last array element. You pass this struct to what
is now CompareCompressionMethodAndDecompress() and it can now run more
like what I described above.

It's possible to imagine doing even better. Imagine that for every
column we maintain an attcompression value and an
attpreservecompression value. The former indicates the compression
type for the column or '\0' if it cannot be compressed, and the latter
indicates whether any other compression type might be present. Then,
when we build the CompressedAttributeFilter object, we can exclude
varlena attributes if attpreservecompression is false and
attcompression is '\0' or matches the attcompression value for the
corresponding attribute in the table into which ExecModifyTable() or
intorel_receive() will be putting the tuple. This seems quite complex
in terms of bookkeeping, but it would allow us to elide basically all
of the per-tuple bookkeeping in a lot of common cases, such as UPDATE,
or an INSERT or CTAS into a table that's using the same compression
method as the source data. You could probably contrive it so that you
have a CompressedAttributeFilter pointer that's NULL if no such
treatment is required, just like we already do for junkfilter.

There's another, rather brute-force approach to this problem, too. We
could just decide that lz4 will only be used for external data, and
that there's no such thing as an inline-compressed lz4 varlena.
deotast_fetch_datum() would just notice that the value is lz4'd and
de-lz4 it before returning it, since a compressed lz4 datum is
impossible.

I'm open to being convinced that we don't need to do either of these
things, and that the cost of iterating over all varlenas in the tuple
is not so bad as to preclude doing things as you have them here. But,
I'm afraid it's going to be too expensive.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-03-15 19:35:52 Re: pg_amcheck contrib application
Previous Message Andres Freund 2021-03-15 19:21:24 Re: New IndexAM API controlling index vacuum strategies