Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-19 14:11:03
Message-ID: CAFiTN-t03PqWCaRW1GoWQa0i2NwYw_p=FDj+Kxu_-rY_viALqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 19, 2021 at 1:27 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Mar 18, 2021 at 10:22 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > I just realized that in the last patch (0003) I forgot to remove 2
> > unused functions, CompressionMethodToId and CompressionIdToMethod.
> > Removed in the latest patch.
>
> I spent a little time polishing 0001 and here's what I came up with. I
> adjusted some comments, added documentation, fixed up the commit
> message, etc.

Thanks, the changes looks fine to me.
>
> I still don't quite like the approach in 0002. I feel that the
> function should not construct the tuple but modify the caller's arrays
> as a side effect. And if we're absolutely committed to the design
> where it does that, the comments need to call it out clearly, which
> they don't.

Added comment for the same.

> Regarding 0003:
>
> I think it might make sense to change the names of the compression and
> decompression functions to match the names of the callers more
> closely. Like, toast_decompress_datum() calls either
> pglz_cmdecompress() or lz4_cmdecompress(). But, why not
> pglz_decompress_datum() or lz4_decompress_datum()? The "cm" thing
> doesn't really mean anything, and because the varlena is allocated by
> that function itself rather than the caller, this can't be used for
> anything other than TOAST.

Done

> In toast_compress_datum(), if (tmp == NULL) return
> PointerGetDatum(NULL) is duplicated. It would be better to move it
> after the switch.

Done

> Instead of "could not compress data with lz4" I suggest "lz4
> compression failed".

Done

> In catalogs.sgml, you shouldn't mention InvalidCompressionMethod, but
> you should explain what the actual possible values mean. Look at the
> way attidentity and attgenerated are documented and do it like that.

Done

> In pg_column_compression() it might be a bit more elegant to add a
> char *result variable or similar, and have the switch cases just set
> it, and then do PG_RETURN_TEXT_P(cstring_to_text(result)) at the
> bottom.

Done

> In getTableAttrs(), if the remoteVersion is new, the column gets a
> different alias than if the column is old.

Fixed

> In dumpTableSchema(), the condition tbinfo->attcompression[j] means
> exactly the thing as the condition tbinfo->attcompression[j] != '\0',
> so it can't be right to test both. I think that there's some confusion
> here about the data type of tbinfo->attcompression[j]. It seems to be
> char *. Maybe you intended to test the first character in that second
> test, but that's not what this does. But you don't need to test that
> anyway because the switch already takes care of it. So I suggest (a)
> removing tbinfo->attcompression[j] != '\0' from this if-statement and
> (b) adding != NULL to the previous line for clarity. I would also
> suggest concluding the switch with a break just for symmetry.

Fixed

> The patch removes 11 references to va_extsize and leaves behind 4.
> None of those 4 look like things that should have been left.

Fixed

> The comment which says "When fetching a prefix of a compressed
> external datum, account for the rawsize tracking amount of raw data,
> which is stored at the beginning as an int32 value)" is no longer 100%
> accurate. I suggest changing it to say something like "When fetching a
> prefix of a compressed external datum, account for the space required
> by va_tcinfo" and leave out the rest.

Done

> In describeOneTableDetails, the comment "compresssion info" needs to
> be compressed by removing one "s".

Done

> It seems a little unfortunate that we need to include
> access/toast_compression.h in detoast.h. It seems like the reason we
> need to do that is because otherwise we won't have ToastCompressionId
> defined and so we won't be able to prototype toast_get_compression_id.
> But I think we should solve that problem by moving that file to
> toast_compression.c. (I'm OK if you want to keep the files separate,
> or if you want to reverse course and combine them I'm OK with that
> too, but the extra header dependency is clearly a sign of a problem
> with the split.)

Moved to toast_compression.c

> Regarding 0005:
>
> I think ApplyChangesToIndexes() should be renamed to something like
> SetIndexStorageProperties(). It's too generic right now.

Done

> I think 0004 and 0005 should just be merged into 0003. I can't see
> committing them separately. I know I was the one who made you split
> the patch up in the first place, but those patches are quite small and
> simple now, so it makes more sense to me to combine them.

Done

Also added a test case for vacuum full to recompress the data.

One question, like storage should we apply the alter set compression
changes recursively to the inherited children (I have attached a
separate patch for this )?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v39-0001-Invent-HeapTupleGetRawDatum-and-friends.patch text/x-patch 40.4 KB
v39-0002-Expand-the-external-data-before-forming-the-tupl.patch text/x-patch 8.3 KB
v39-0004-default-to-with-lz4.patch text/x-patch 1.7 KB
v39-0003-Built-in-compression-method.patch text/x-patch 132.4 KB
recursive_set_compression.patch text/x-patch 583 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-03-19 14:15:44 Re: [PATCH] pg_stat_statements dealloc field ignores manual deallocation
Previous Message David Steele 2021-03-19 14:09:45 Re: support for MERGE