Re: Compressed TOAST Slicing

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Regina Obe <r(at)pcorp(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Compressed TOAST Slicing
Date: 2019-03-19 11:47:45
Message-ID: 20190319114744.GS6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Paul Ramsey (pramsey(at)cleverelephant(dot)ca) wrote:
> > On Mar 18, 2019, at 7:34 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > +1. I think Paul had it right originally.
>
> In that spirit, here is a “one pglz_decompress function, new parameter” version for commit.

Alright, I've been working through this and have made a few improvements
(the big comment block at the top of pg_lzcompress.c needed updating,
among a couple other minor things), but I was trying to wrap my head
around this:

+/* ----------
+ * toast_decompress_datum_slice -
+ *
+ * Decompress the front of a compressed version of a varlena datum.
+ * offset handling happens in heap_tuple_untoast_attr_slice.
+ * Here we just decompress a slice from the front.
+ */
+static struct varlena *
+toast_decompress_datum_slice(struct varlena *attr, int32 slicelength)
+{
+ struct varlena *result;
+ int32 rawsize;
+
+ Assert(VARATT_IS_COMPRESSED(attr));
+
+ result = (struct varlena *) palloc(slicelength + VARHDRSZ);
+ SET_VARSIZE(result, slicelength + VARHDRSZ);
+
+ rawsize = pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
+ VARSIZE(attr) - TOAST_COMPRESS_HDRSZ,
+ VARDATA(result),
+ slicelength, false);
+ if (rawsize < 0)
elog(ERROR, "compressed data is corrupted");

+ SET_VARSIZE(result, rawsize + VARHDRSZ);
return result;
}

Specifically, the two SET_VARSIZE() calls, do we really need both..?
Are we sure that we're setting the length correctly there..? Is there
any cross-check we can do?

I have to admit that I find the new argument to pglz_decompress() a bit
awkward to describe and document; if you have any thoughts as to how
that could be improved, that'd be great.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-03-19 11:49:20 Re: partitioned tables referenced by FKs
Previous Message Masahiko Sawada 2019-03-19 11:22:33 Re: [HACKERS] Block level parallel vacuum