Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-19 16:21:57
Message-ID: CAFiTN-um3+YD3Su=vEwgY8SgcQFuBY5xVvKgE4ks+=-zb6T+Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 11, 2021 at 3:26 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> In CompareCompressionMethodAndDecompress, I think this is still
> playing a bit fast and loose with the rules around slots. I think we
> can do better. Suppose that at the point where we discover that we
> need to decompress at least one attribute, we create the new slot
> right then, and also memcpy tts_values and tts_isnull. Then, for that
> attribute and any future attributes that need decompression, we reset
> tts_values in the *new* slot, leaving the old one untouched. Then,
> after finishing all the attributes, the if (decompressed_any) block,
> you just have a lot less stuff to do. The advantage of this is that
> you haven't tainted the old slot; it's still got whatever contents it
> had before, and is in a clean state, which seems better to me.

Fixed

>
> It's unclear to me whether this function actually needs to
> ExecMaterializeSlot(newslot). It definitely does need to
> ExecStoreVirtualTuple(newslot) and I think it's a very good idea, if
> not absolutely mandatory, for it not to modify anything about the old
> slot. But what's the argument that the new slot needs to be
> materialized at this point? It may be needed, if the old slot would've
> had to be materialized at this point. But it's something to think
> about.

I think if the original slot was materialized then materialing the new
slot make more sense to me so done that way.

>
> The CREATE TABLE documentation says that COMPRESSION is a kind of
> column constraint, but that's wrong. For example, you can't write
> CREATE TABLE a (b int4 CONSTRAINT thunk COMPRESSION lz4), for example,
> contrary to what the syntax summary implies. When you fix this so that
> the documentation matches the grammar change, you may also need to
> move the longer description further up in create_table.sgml so the
> order matches.

Fixed

> The use of VARHDRSZ_COMPRESS in toast_get_compression_oid() appears to
> be incorrect. VARHDRSZ_COMPRESS is offsetof(varattrib_4b,
> va_compressed.va_data). But what gets externalized in the case of a
> compressed datum is just VARDATA(dval), which excludes the length
> word, unlike VARHDRSZ_COMPRESS, which does not. This has no
> consequences since we're only going to fetch 1 chunk either way, but I
> think we should make it correct.

Fixed

> TOAST_COMPRESS_SET_SIZE_AND_METHOD() could Assert something about cm_method.

While replying to the comments, I realised that I have missed it. I
will fix it in the next version.

> Small delta patch with a few other suggested changes attached.

Merged

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-02-19 16:22:05 Re: PoC Refactor AM analyse API
Previous Message Dilip Kumar 2021-02-19 16:18:47 Re: [HACKERS] Custom compression methods