Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-09 12:44:58
Message-ID: CAFiTN-tyWCqoHzsKzTbk1R6PMO0pC3g-=5=giB644rLjfZkCYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 9, 2021 at 2:08 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sun, Feb 7, 2021 at 5:15 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Feb 5, 2021 at 8:11 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Feb 3, 2021 at 2:07 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > >
> > > > Even more review comments, still looking mostly at 0001:
> > > >
> > > > If there's a reason why parallel_schedule is arranging to run the
> > > > compression test in parallel with nothing else, the comment in that
> > > > file should explain the reason. If there isn't, it should be added to
> > > > a parallel group that doesn't have the maximum number of tests yet,
> > > > probably the last such group in the file.
> > > >
> > > > serial_schedule should add the test in a position that roughly
> > > > corresponds to where it appears in parallel_schedule.
> > >
> > > Done
> > >
> > > > I believe it's relatively standard practice to put variable
> > > > declarations at the top of the file. compress_lz4.c and
> > > > compress_pglz.c instead put those declarations nearer to the point of
> > > > use.
> > >
> > > Do you mean pglz_compress_methods and lz4_compress_methods ? I
> > > followed that style from
> > > heapam_handler.c. If you think that doesn't look good then I can move it up.
> > >
> > > > compressamapi.c has an awful lot of #include directives for the code
> > > > it actually contains. I believe that we should cut that down to what
> > > > is required by 0001, and other patches can add more later as required.
> > > > In fact, it's tempting to just get rid of this .c file altogether and
> > > > make the two functions it contains static inline functions in the
> > > > header, but I'm not 100% sure that's a good idea.
> > >
> > > I think it looks better to move them to compressamapi.h so done that.
> > >
> > > > The copyright dates in a number of the file headers are out of date.
> > >
> > > Fixed
> > >
> > > > binary_upgrade_next_pg_am_oid and the related changes to
> > > > CreateAccessMethod don't belong in 0001, because it doesn't support
> > > > non-built-in compression methods. These changes and the related
> > > > pg_dump change should be moved to the patch that adds support for
> > > > that.
> > >
> > > Fixed
> > >
> > > > The comments added to dumpTableSchema() say that "compression is
> > > > assigned by ALTER" but don't give a reason. I think they should. I
> > > > don't know how much they need to explain about what the code does, but
> > > > they definitely need to explain why it does it. Also, isn't this bad?
> > > > If we create the column with the wrong compression setting initially
> > > > and then ALTER it, we have to rewrite the table. If it's empty, that's
> > > > cheap, but it'd still be better not to do it at all.
> > >
> > > Yeah, actually that part should go in 0003 patch where we implement
> > > the custom compression method.
> > > in that patch we need to alter and set because we want to keep the
> > > preserved method as well
> > > So I will add it there
> > >
> > > > I'm not sure it's a good idea for dumpTableSchema() to leave out
> > > > specifying the compression method if it happens to be pglz. I think we
> > > > definitely shouldn't do it in binary-upgrade mode. What if we changed
> > > > the default in a future release? For that matter, even 0002 could make
> > > > the current approach unsafe.... I think, anyway.
> > >
> > > Fixed
> > >
> > >
> > > > The changes to pg_dump.h look like they haven't had a visit from
> > > > pgindent. You should probably try to do that for the whole patch,
> > > > though it's a bit annoying since you'll have to manually remove
> > > > unrelated changes to the same files that are being modified by the
> > > > patch. Also, why the extra blank line here?
> > >
> > > Fixed, ran pgindent for other files as well.
> > >
> > > > GetAttributeCompression() is hard to understand. I suggest changing
> > > > the comment to "resolve column compression specification to an OID"
> > > > and somehow rejigger the code so that you aren't using one not-NULL
> > > > test and one NULL test on the same variable. Like maybe change the
> > > > first part to if (!IsStorageCompressible(typstorage)) { if
> > > > (compression == NULL) return InvalidOid; ereport(ERROR, ...); }
> > >
> > > Done
> > >
> > > > It puzzles me that CompareCompressionMethodAndDecompress() calls
> > > > slot_getallattrs() just before clearing the slot. It seems like this
> > > > ought to happen before we loop over the attributes, so that we don't
> > > > need to call slot_getattr() every time. See the comment for that
> > > > function. But even if we didn't do that for some reason, why would we
> > > > do it here? If it's already been done, it shouldn't do anything, and
> > > > if it hasn't been done, it might overwrite some of the values we just
> > > > poked into tts_values. It also seems suspicious that we can get away
> > > > with clearing the slot and then again marking it valid. I'm not sure
> > > > it really works like that. Like, can't clearing the slot invalidate
> > > > pointers stored in tts_values[]? For instance, if they are pointers
> > > > into an in-memory heap tuple, tts_heap_clear() is going to free the
> > > > tuple; if they are pointers into a buffer, tts_buffer_heap_clear() is
> > > > going to unpin it. I think the supported procedure for this sort of
> > > > thing is to have a second slot, set tts_values, tts_isnull etc. and
> > > > then materialize the slot. After materializing the new slot, it's
> > > > independent of the old slot, which can then be cleared. See for
> > > > example tts_virtual_materialize(). The whole approach you've taken
> > > > here might need to be rethought a bit. I think you are right to want
> > > > to avoid copying everything over into a new slot if nothing needs to
> > > > be done, and I think we should definitely keep that optimization, but
> > > > I think if you need to copy stuff, you have to do the above procedure
> > > > and then continue using the other slot instead of the original one.
> > > > Some places I think we have functions that return either the original
> > > > slot or a different one depending on how it goes; that might be a
> > > > useful idea here. But, you also can't just spam-create slots; it's
> > > > important that whatever ones we end up with get reused for every
> > > > tuple.
> > >
> > > I have changed this algorithm, so now if we have to decompress
> > > anything we will use the new slot and we will stick that new slot to
> > > the ModifyTableState, DR_transientrel for matviews and DR_intorel for
> > > CTAS. Does this looks okay or we need to do something else? If this
> > > logic looks fine then maybe we can think of some more optimization and
> > > cleanup in this function.
> > >
> > >
> > > > Doesn't the change to describeOneTableDetails() require declaring
> > > > changing the declaration of char *headers[11] to char *headers[12]?
> > > > How does this not fail Assert(cols <= lengthof(headers))?
> > >
> > > Fixed
> > >
> > > > Why does describeOneTableDetais() arrange to truncate the printed
> > > > value? We don't seem to do that for the other column properties, and
> > > > it's not like this one is particularly long.
> > >
> > > Not required, fixed.
> > >
> > > > Perhaps the changes to pg_am.dat shouldn't remove the blank line?
> > >
> > > Fixed
> > >
> > > > I think the comment to pg_attribute.h could be rephrased to stay
> > > > something like: "OID of compression AM. Must be InvalidOid if and only
> > > > if typstorage is 'a' or 'b'," replacing 'a' and 'b' with whatever the
> > > > right letters are. This would be shorter and I think also clearer than
> > > > what you have
> > >
> > > Fixed
> > >
> > > > The first comment change in postgres.h is wrong. You changed
> > > > va_extsize to "size in va_extinfo" but the associated structure
> > > > definition is unchanged, so the comment shouldn't be changed either.
> > >
> > > Yup, not required.
> > >
> > > > In toast_internals.h, you end using 30 as a constant several times but
> > > > have no #define for it. You do have a #define for RAWSIZEMASK, but
> > > > that's really a derived value from 30. Also, it's not a great name
> > > > because it's kind of generic. So how about something like:
> > > >
> > > > #define TOAST_RAWSIZE_BITS 30
> > > > #define TOAST_RAWSIZE_MASK ((1 << (TOAST_RAW_SIZE_BITS + 1)) - 1)
> > > >
> > > > But then again on second thought, this 30 seems to be the same 30 that
> > > > shows up in the changes to postgres.h, and there again 0x3FFFFFFF
> > > > shows up too. So maybe we should actually be defining these constants
> > > > there, using names like VARLENA_RAWSIZE_BITS and VARLENA_RAWSIZE_MASK
> > > > and then having toast_internals.h use those constants as well.
> > >
> > > Done, IMHO it should be
> > > #define VARLENA_RAWSIZE_BITS 30
> > > #define VARLENA_RAWSIZE_MASK ((1 << VARLENA_RAWSIZE_BITS) -1 )
> > >
> > >
> > > > Taken with the email I sent yesterday, I think this is a more or less
> > > > complete review of 0001. Although there are a bunch of things to fix
> > > > here still, I don't think this is that far from being committable. I
> > > > don't at this point see too much in terms of big design problems.
> > > > Probably the CompareCompressionMethodAndDecompress() is the closest to
> > > > a design-level problem, and certainly something needs to be done about
> > > > it, but even that is a fairly localized problem in the context of the
> > > > entire patch.
> > >
> > > 0001 is attached, now pending parts are
> > >
> > > - Confirm the new design of CompareCompressionMethodAndDecompress
> > > - Performance test, especially lz4 with small varlena
> >
> > I have tested the performance, pglz vs lz4
> >
> > Test1: With a small simple string, pglz doesn't select compression but
> > lz4 select as no min limit
> > Table: 100 varchar column
> > Test: Insert 1000 tuple, each column of 25 bytes string (32 is min
> > limit for pglz)
> > Result:
> > pglz: 1030 ms (doesn't attempt compression so externalize),
> > lz4: 212 ms
> >
> > Test2: With small incompressible string, pglz don't select compression
> > lz4 select but can not compress
> > Table: 100 varchar column
> > Test: Insert 1000 tuple, each column of 25 bytes string (32 is min
> > limit for pglz)
> > Result:
> > pglz: 1030 ms (doesn't attempt compression so externalize),
> > lz4: 1090 ms (attempt to compress but externalize):
> >
> > Test3: Test a few columns with large random data
> > Table: 3 varchar column
> > Test: Insert 1000 tuple 3 columns size(3500 byes, 4200 bytes, 4900bytes)
> > pglz: 150 ms (compression ratio: 3.02%),
> > lz4: 30 ms (compression ratio : 2.3%)
> >
> > Test4: Test3 with different large random slighly compressible, need to
> > compress + externalize:
> > Table: 3 varchar column
> > Insert: Insert 1000 tuple 3 columns size(8192, 8192, 8192)
> > CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
> > 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
> > Test: insert into t1 select large_val(), large_val(), large_val() from
> > generate_series(1,1000);
> > pglz: 2000 ms
> > lz4: 1500 ms
> >
> > Conclusion:
> > 1. In most cases lz4 is faster and doing better compression as well.
> > 2. In Test2 when small data is incompressible then lz4 tries to
> > compress whereas pglz doesn't try so there is some performance loss.
> > But if we want we can fix
> > it by setting some minimum limit of size for lz4 as well, maybe the
> > same size as pglz?
> >
> > > - Rebase other patches atop this patch
> > > - comment in ddl.sgml
> >
> > Other changes in patch:
> > - Now we are dumping the default compression method in the
> > binary-upgrade mode so the pg_dump test needed some change, fixed
> > that.
> > - in compress_pglz.c and compress_lz4.c, we were using
> > toast_internal.h macros so I removed and used varlena macros instead.
> >
> > While testing, I noticed that if the compressed data are externalized
> > then pg_column_compression(), doesn't fetch the compression method
> > from the toast chunk, I think we should do that. I will analyze this
> > and fix it in the next version.
>
> While trying to fix this, I have realized this problem exists in
> CompareCompressionMethodAndDecompress
> see below code.
> ---
> + new_value = (struct varlena *)
> + DatumGetPointer(slot->tts_values[attnum - 1]);
> +
> + /* nothing to be done, if it is not compressed */
> + if (!VARATT_IS_COMPRESSED(new_value))
> + continue;
> ---
>
> Basically, we are just checking whether the stored value is compressed
> or not, but we are clearly ignoring the fact that it might be
> compressed and stored externally on disk. So basically if the value
> is stored externally we can not know whether the external data were
> compressed or not without fetching the values from the toast table, I
> think instead of fetching the complete data from toast we can just
> fetch the header using 'toast_fetch_datum_slice'.
>
> Any other thoughts on this?

I think I was partially wrong here. Basically, there is a way to know
whether the external data are compressed or not using
VARATT_EXTERNAL_IS_COMPRESSED macro. However, if it is compressed
then we will have to fetch the toast slice of size
toast_compress_header, to know the compression method.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-09 12:56:04 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Matthias van de Meent 2021-02-09 12:32:33 Re: Improvements and additions to COPY progress reporting