Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-16 14:20:35
Message-ID: CAFiTN-vgkoXYqqtJxwqvh_UjZ7WZhN=-485YWSVW_U6_L9KOQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 13, 2021 at 8:14 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Feb 11, 2021 at 8:17 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, Feb 11, 2021 at 7:36 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even
> > > attempt to detoast if there is no external field in the tuple, in POC
> > > I have got rid of that check, but I think we might need to do better.
> > > Maybe we can add a flag in infomask to detect whether the tuple has
> > > any compressed data or not as we have for detecting the external data
> > > (HEAP_HASEXTERNAL).
> >
> > No. This feature isn't close to being important enough to justify
> > consuming an infomask bit.
>
> Okay,
>
> > I don't really see why we need it anyway. If array construction
> > already categorically detoasts, why can't we do the same thing here?
> > Would it really cost that much? In what case? Having compressed values
> > in a record we're going to store on disk actually seems like a pretty
> > dumb idea. We might end up trying to recompress something parts of
> > which have already been compressed.
> >
>
> If we refer the comments atop function "toast_flatten_tuple_to_datum"
>
> ---------------
> * We have a general rule that Datums of container types (rows, arrays,
> * ranges, etc) must not contain any external TOAST pointers. Without
> * this rule, we'd have to look inside each Datum when preparing a tuple
> * for storage, which would be expensive and would fail to extend cleanly
> * to new sorts of container types.
> *
> * However, we don't want to say that tuples represented as HeapTuples
> * can't contain toasted fields, so instead this routine should be called
> * when such a HeapTuple is being converted into a Datum.
> *
> * While we're at it, we decompress any compressed fields too. This is not
> * necessary for correctness, but reflects an expectation that compression
> * will be more effective if applied to the whole tuple not individual
> * fields. We are not so concerned about that that we want to deconstruct
> * and reconstruct tuples just to get rid of compressed fields, however.
> * So callers typically won't call this unless they see that the tuple has
> * at least one external field.
> ----------------
>
> It appears that the general rule we want to follow is that while
> creating the composite type we want to flatten any external pointer,
> but while doing that we also decompress any compressed field with the
> assumption that compressing the whole row/array will be a better idea
> instead of keeping them compressed individually. However, if there
> are no external toast pointers then we don't want to make an effort to
> just decompress the compressed data.
>
> Having said that I don't think this rule is followed throughout the
> code for example
>
> 1. "ExecEvalRow" is calling HeapTupleHeaderGetDatum only if there is
> any external field and which is calling "toast_flatten_tuple_to_datum"
> so this is following the rule.
> 2. "ExecEvalWholeRowVar" is calling "toast_build_flattened_tuple", but
> this is just flattening the external toast pointer but not doing
> anything to the compressed data.
> 3. "ExecEvalArrayExpr" is calling "construct_md_array", which will
> detoast the attribute if attlen is -1, so this will decompress any
> compressed data even though there is no external toast pointer.
>
> So in 1 we are following the rule but in 2 and 3 we are not.
>
> IMHO, for the composite data types we should make common a rule and we
> should follow that everywhere. As you said it will be good if we can
> always detoast any external/compressed data, that will help in getting
> better compression as well as fetching the data will be faster because
> we can avoid multi level detoasting/decompression. I will analyse
> this further and post a patch for the same.

I have done further analysis of this issue and came up with the
attached patch. So with this patch, like external toast posiners we
will not allow any compressed data also in the composite types. The
problem is that now we will be processing all the tuple while forming
the composite type irrespective of the source of the tuple, I mean if
user is directly inserting into the array type and not selecting from
another table then there will not be any compressed data so now
checking each field of tuple for compressed data is unnecessary but I
am not sure how to distinguish between those cases.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Disallow-compressed-data-inside-container-types.patch text/x-patch 6.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message chenhj 2021-02-16 14:45:59 Re: [Proposal] Page Compression for OLTP
Previous Message Anastasia Lubennikova 2021-02-16 14:07:28 Re: Performing partition pruning using row value