Re: [proposal] de-TOAST'ing using a iterator

From: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To: Binguo Bao <djydewang(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Владимир Лесков <vladimirlesk(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [proposal] de-TOAST'ing using a iterator
Date: 2019-08-19 04:55:37
Message-ID: CACPNZCuTJk5SUsjQAZqdg43OTq8+QX6Xi49BWVUmZhJuH67efw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 16, 2019 at 10:48 PM Binguo Bao <djydewang(at)gmail(dot)com> wrote:
> [v8 patch with cosmetic changes]

Okay, looks good. I'll make a few style suggestions and corrections.
In the course of looking at this again, I have a few other questions
below as well.

It looks like you already do this for the most part, but I'll mention
that we try to keep lines, including comments, less than 80 characters
long. pgindent can try to fix that, but the results don't always look
nice.

About variable names: The iterator pointers are variously called
"iter", "iterator", and "fetch_iter". I found this confusing the first
time I read this code. I think we should use "iter" if we have only
one kind in the function, and "detoast_iter" and "fetch_iter" if we
have both kinds.
--

init_detoast_iterator():

+ * The "iterator" variable is normally just a local variable in the caller.

I don't think this comment is helpful to understand this function or its use.

+ * It only make sense to initialize de-TOAST iterator for external
on-disk value.

s/make/makes/
"a" de-TOAST iterator
s/value/values/

The comments in this function that start with "This is a ..." could be
shortened like this:

/* indirect pointer -- dereference it */

While looking at this again, I noticed we no longer need to test for
the in-line compressed case at all. I also tried some other cosmetic
rearrangements. Let me know what you think about the attached patch.
Also, I wonder if the VARATT_IS_EXTERNAL_INDIRECT case should come
first. Then the two normal cases are next to eachother.

free_detoast_iterator(), free_fetch_datum_iterator(), and free_toast_buffer():

These functions should return void.

+ * Free the memory space occupied by the de-TOAST iterator include buffers and
+ * fetch datum iterator.

Perhaps "Free memory used by the de-TOAST iterator, including buffers
and fetch datum iterator."

The check

if (iter->buf != iter->fetch_datum_iterator->buf)

is what we need to do for the compressed case. Could we use this
directly instead of having a separate state variable iter->compressed,
with a macro like this?

#define TOAST_ITER_COMPRESSED(iter) \
(iter->buf != iter->fetch_datum_iterator->buf)

Or maybe that's too clever?

detoast_iterate():

+ * As long as there is another data chunk in compression or external storage,

We no longer use the iterator with in-line compressed values.

+ * de-TOAST it into toast buffer in iterator.

Maybe "into the iterator's toast buffer"

fetch_datum_iterate():

My remarks for detoast_iterate() also apply here.

init_toast_buffer():

+ * Note the constrain buf->position <= buf->limit may be broken
+ * at initialization. Make sure that the constrain is satisfied
+ * when consume chars.

s/constrain/constraint/ (2 times)
s/consume/consuming/

Also, this comment might be better at the top the whole function?

pglz_decompress_iterate():

+ * Decompresses source into dest until the source is exhausted.

This comment is from the original function, but I think it would be
better to highlight the differences from the original, something like:

"This function is based on pglz_decompress(), with these additional
requirements:

1. We need to save the current control byte and byte position for the
caller's next iteration.

2. In pglz_decompress(), we can assume we have all the source bytes
available. This is not the case when we decompress one chunk at a
time, so we have to make sure that we only read bytes available in the
current chunk."

(I'm not sure about the term 'byte position', maybe there's a better one.)

+ * In the while loop, sp may go beyond the srcend, provides a four-byte
+ * buffer to prevent sp from reading unallocated bytes from source buffer.
+ * When source->limit reaches source->capacity, don't worry about reading
+ * unallocated bytes.

Here's my suggestion:

"In the while loop, sp may be incremented such that it points beyond
srcend. To guard against reading beyond the end of the current chunk,
we set srcend such that we exit the loop when we are within four bytes
of the end of the current chunk. When source->limit reaches
source->capacity, we are decompressing the last chunk, so we can (and
need to) read every byte."

+ for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)

Note you can also replace 8 with INVALID_CTRLC here.

tuptoaster.h:
+ * Constrains that need to be satisfied:

s/constrains/constraints/

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8

I think the macro might be better placed in pg_lzcompress.h, and for
consistency used in pglz_decompress(). Then the comment can be shorter
and more general. With my additional comment in
init_detoast_iterator(), hopefully it will be clear to readers.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
rearrange-init-iter.patch application/x-patch 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-19 05:51:00 Re: Add "password_protocol" connection parameter to libpq
Previous Message Dilip Kumar 2019-08-19 04:37:13 Re: POC: Cleaning up orphaned files using undo logs