Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-07 11:45:42
Message-ID: CAFiTN-unUNcgU0M29bdTrGHA4agK8T7g+dpMyVMeRtPawSWhCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 5, 2021 at 8:11 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Feb 3, 2021 at 2:07 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > Even more review comments, still looking mostly at 0001:
> >
> > If there's a reason why parallel_schedule is arranging to run the
> > compression test in parallel with nothing else, the comment in that
> > file should explain the reason. If there isn't, it should be added to
> > a parallel group that doesn't have the maximum number of tests yet,
> > probably the last such group in the file.
> >
> > serial_schedule should add the test in a position that roughly
> > corresponds to where it appears in parallel_schedule.
>
> Done
>
> > I believe it's relatively standard practice to put variable
> > declarations at the top of the file. compress_lz4.c and
> > compress_pglz.c instead put those declarations nearer to the point of
> > use.
>
> Do you mean pglz_compress_methods and lz4_compress_methods ? I
> followed that style from
> heapam_handler.c. If you think that doesn't look good then I can move it up.
>
> > compressamapi.c has an awful lot of #include directives for the code
> > it actually contains. I believe that we should cut that down to what
> > is required by 0001, and other patches can add more later as required.
> > In fact, it's tempting to just get rid of this .c file altogether and
> > make the two functions it contains static inline functions in the
> > header, but I'm not 100% sure that's a good idea.
>
> I think it looks better to move them to compressamapi.h so done that.
>
> > The copyright dates in a number of the file headers are out of date.
>
> Fixed
>
> > binary_upgrade_next_pg_am_oid and the related changes to
> > CreateAccessMethod don't belong in 0001, because it doesn't support
> > non-built-in compression methods. These changes and the related
> > pg_dump change should be moved to the patch that adds support for
> > that.
>
> Fixed
>
> > The comments added to dumpTableSchema() say that "compression is
> > assigned by ALTER" but don't give a reason. I think they should. I
> > don't know how much they need to explain about what the code does, but
> > they definitely need to explain why it does it. Also, isn't this bad?
> > If we create the column with the wrong compression setting initially
> > and then ALTER it, we have to rewrite the table. If it's empty, that's
> > cheap, but it'd still be better not to do it at all.
>
> Yeah, actually that part should go in 0003 patch where we implement
> the custom compression method.
> in that patch we need to alter and set because we want to keep the
> preserved method as well
> So I will add it there
>
> > I'm not sure it's a good idea for dumpTableSchema() to leave out
> > specifying the compression method if it happens to be pglz. I think we
> > definitely shouldn't do it in binary-upgrade mode. What if we changed
> > the default in a future release? For that matter, even 0002 could make
> > the current approach unsafe.... I think, anyway.
>
> Fixed
>
>
> > The changes to pg_dump.h look like they haven't had a visit from
> > pgindent. You should probably try to do that for the whole patch,
> > though it's a bit annoying since you'll have to manually remove
> > unrelated changes to the same files that are being modified by the
> > patch. Also, why the extra blank line here?
>
> Fixed, ran pgindent for other files as well.
>
> > GetAttributeCompression() is hard to understand. I suggest changing
> > the comment to "resolve column compression specification to an OID"
> > and somehow rejigger the code so that you aren't using one not-NULL
> > test and one NULL test on the same variable. Like maybe change the
> > first part to if (!IsStorageCompressible(typstorage)) { if
> > (compression == NULL) return InvalidOid; ereport(ERROR, ...); }
>
> Done
>
> > It puzzles me that CompareCompressionMethodAndDecompress() calls
> > slot_getallattrs() just before clearing the slot. It seems like this
> > ought to happen before we loop over the attributes, so that we don't
> > need to call slot_getattr() every time. See the comment for that
> > function. But even if we didn't do that for some reason, why would we
> > do it here? If it's already been done, it shouldn't do anything, and
> > if it hasn't been done, it might overwrite some of the values we just
> > poked into tts_values. It also seems suspicious that we can get away
> > with clearing the slot and then again marking it valid. I'm not sure
> > it really works like that. Like, can't clearing the slot invalidate
> > pointers stored in tts_values[]? For instance, if they are pointers
> > into an in-memory heap tuple, tts_heap_clear() is going to free the
> > tuple; if they are pointers into a buffer, tts_buffer_heap_clear() is
> > going to unpin it. I think the supported procedure for this sort of
> > thing is to have a second slot, set tts_values, tts_isnull etc. and
> > then materialize the slot. After materializing the new slot, it's
> > independent of the old slot, which can then be cleared. See for
> > example tts_virtual_materialize(). The whole approach you've taken
> > here might need to be rethought a bit. I think you are right to want
> > to avoid copying everything over into a new slot if nothing needs to
> > be done, and I think we should definitely keep that optimization, but
> > I think if you need to copy stuff, you have to do the above procedure
> > and then continue using the other slot instead of the original one.
> > Some places I think we have functions that return either the original
> > slot or a different one depending on how it goes; that might be a
> > useful idea here. But, you also can't just spam-create slots; it's
> > important that whatever ones we end up with get reused for every
> > tuple.
>
> I have changed this algorithm, so now if we have to decompress
> anything we will use the new slot and we will stick that new slot to
> the ModifyTableState, DR_transientrel for matviews and DR_intorel for
> CTAS. Does this looks okay or we need to do something else? If this
> logic looks fine then maybe we can think of some more optimization and
> cleanup in this function.
>
>
> > Doesn't the change to describeOneTableDetails() require declaring
> > changing the declaration of char *headers[11] to char *headers[12]?
> > How does this not fail Assert(cols <= lengthof(headers))?
>
> Fixed
>
> > Why does describeOneTableDetais() arrange to truncate the printed
> > value? We don't seem to do that for the other column properties, and
> > it's not like this one is particularly long.
>
> Not required, fixed.
>
> > Perhaps the changes to pg_am.dat shouldn't remove the blank line?
>
> Fixed
>
> > I think the comment to pg_attribute.h could be rephrased to stay
> > something like: "OID of compression AM. Must be InvalidOid if and only
> > if typstorage is 'a' or 'b'," replacing 'a' and 'b' with whatever the
> > right letters are. This would be shorter and I think also clearer than
> > what you have
>
> Fixed
>
> > The first comment change in postgres.h is wrong. You changed
> > va_extsize to "size in va_extinfo" but the associated structure
> > definition is unchanged, so the comment shouldn't be changed either.
>
> Yup, not required.
>
> > In toast_internals.h, you end using 30 as a constant several times but
> > have no #define for it. You do have a #define for RAWSIZEMASK, but
> > that's really a derived value from 30. Also, it's not a great name
> > because it's kind of generic. So how about something like:
> >
> > #define TOAST_RAWSIZE_BITS 30
> > #define TOAST_RAWSIZE_MASK ((1 << (TOAST_RAW_SIZE_BITS + 1)) - 1)
> >
> > But then again on second thought, this 30 seems to be the same 30 that
> > shows up in the changes to postgres.h, and there again 0x3FFFFFFF
> > shows up too. So maybe we should actually be defining these constants
> > there, using names like VARLENA_RAWSIZE_BITS and VARLENA_RAWSIZE_MASK
> > and then having toast_internals.h use those constants as well.
>
> Done, IMHO it should be
> #define VARLENA_RAWSIZE_BITS 30
> #define VARLENA_RAWSIZE_MASK ((1 << VARLENA_RAWSIZE_BITS) -1 )
>
>
> > Taken with the email I sent yesterday, I think this is a more or less
> > complete review of 0001. Although there are a bunch of things to fix
> > here still, I don't think this is that far from being committable. I
> > don't at this point see too much in terms of big design problems.
> > Probably the CompareCompressionMethodAndDecompress() is the closest to
> > a design-level problem, and certainly something needs to be done about
> > it, but even that is a fairly localized problem in the context of the
> > entire patch.
>
> 0001 is attached, now pending parts are
>
> - Confirm the new design of CompareCompressionMethodAndDecompress
> - Performance test, especially lz4 with small varlena

I have tested the performance, pglz vs lz4

Test1: With a small simple string, pglz doesn't select compression but
lz4 select as no min limit
Table: 100 varchar column
Test: Insert 1000 tuple, each column of 25 bytes string (32 is min
limit for pglz)
Result:
pglz: 1030 ms (doesn't attempt compression so externalize),
lz4: 212 ms

Test2: With small incompressible string, pglz don't select compression
lz4 select but can not compress
Table: 100 varchar column
Test: Insert 1000 tuple, each column of 25 bytes string (32 is min
limit for pglz)
Result:
pglz: 1030 ms (doesn't attempt compression so externalize),
lz4: 1090 ms (attempt to compress but externalize):

Test3: Test a few columns with large random data
Table: 3 varchar column
Test: Insert 1000 tuple 3 columns size(3500 byes, 4200 bytes, 4900bytes)
pglz: 150 ms (compression ratio: 3.02%),
lz4: 30 ms (compression ratio : 2.3%)

Test4: Test3 with different large random slighly compressible, need to
compress + externalize:
Table: 3 varchar column
Insert: Insert 1000 tuple 3 columns size(8192, 8192, 8192)
CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
Test: insert into t1 select large_val(), large_val(), large_val() from
generate_series(1,1000);
pglz: 2000 ms
lz4: 1500 ms

Conclusion:
1. In most cases lz4 is faster and doing better compression as well.
2. In Test2 when small data is incompressible then lz4 tries to
compress whereas pglz doesn't try so there is some performance loss.
But if we want we can fix
it by setting some minimum limit of size for lz4 as well, maybe the
same size as pglz?

> - Rebase other patches atop this patch
> - comment in ddl.sgml

Other changes in patch:
- Now we are dumping the default compression method in the
binary-upgrade mode so the pg_dump test needed some change, fixed
that.
- in compress_pglz.c and compress_lz4.c, we were using
toast_internal.h macros so I removed and used varlena macros instead.

While testing, I noticed that if the compressed data are externalized
then pg_column_compression(), doesn't fetch the compression method
from the toast chunk, I think we should do that. I will analyze this
and fix it in the next version.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v22-0001-Built-in-compression-method.patch application/octet-stream 234.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-02-07 12:26:40 Re: Bug in query rewriter - hasModifyingCTE not getting set
Previous Message Magnus Hagander 2021-02-07 10:21:05 Re: Prevent printing "next step instructions" in initdb and pg_upgrade