Re: [proposal] de-TOAST'ing using a iterator

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Binguo Bao <djydewang(at)gmail(dot)com>
Cc: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Владимир Лесков <vladimirlesk(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [proposal] de-TOAST'ing using a iterator
Date: 2019-09-03 20:12:26
Message-ID: 20190903201226.GA16197@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> +static void
> +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed)
> +{
> + buf->buf = (const char *) palloc0(size);

This API is weird -- you always palloc the ToastBuffer first, then call
init_toast_bufer on it. Why not palloc the ToastBuffer struct in
init_toast_buffer and return it from there instead? This is
particularly strange since the ToastBuffer itself is freed by the "free"
routine ... so it's not like we're thinking that caller can take
ownership of the struct by embedding it in a larger struct.

Also, this function needs a comment on top explaining what it does and
what the params are.

Why do we need ToastBuffer->buf_size? Seems unused.

> + if (iter == NULL)
> + {
> + return;
> + }

Please, no braces around single-statement blocks. (Many places).

> +/*
> + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> + * the field is invalid and need to read the control byte from the
> + * source buffer in the next iteration, see pglz_decompress_iterate().
> + */
> +#define INVALID_CTRLC 8

What does CTRLC stand for? Also: this comment should explain why the
value 8 is what it is.

> + /*
> + * Now we copy the bytes specified by the tag from OUTPUT to
> + * OUTPUT. It is dangerous and platform dependent to use
> + * memcpy() here, because the copied areas could overlap
> + * extremely!
> + */
> + len = Min(len, destend - dp);
> + while (len--)
> + {
> + *dp = dp[-off];
> + dp++;
> + }

So why not use memmove?

> + /*
> + * Otherwise it contains the match length minus 3 and the
> + * upper 4 bits of the offset. The next following byte
> + * contains the lower 8 bits of the offset. If the length is
> + * coded as 18, another extension tag byte tells how much
> + * longer the match really was (0-255).
> + */
> + int32 len;
> + int32 off;
> +
> + len = (sp[0] & 0x0f) + 3;
> + off = ((sp[0] & 0xf0) << 4) | sp[1];
> + sp += 2;
> + if (len == 18)
> + len += *sp++;

Starting this para with "Otherwise" makes no sense, since there's no
previous opposite case. Please reword. However, I don't recognize this
code from anywhere, and it seems to have a lot of magical numbers. Is
this code completely new?

Didn't much like FetchDatumIteratorData SnapshotToast struct member
name. How about just "snapshot"?

> +#define PG_DETOAST_ITERATE(iter, need) \
> + do { \
> + Assert(need >= iter->buf->buf && need <= iter->buf->capacity); \
> + while (!iter->done && need >= iter->buf->limit) { \
> + detoast_iterate(iter); \
> + } \
> + } while (0)

This needs parens around each "iter" and "need" in the macro definition.
Also, please add a comment documenting what the arguments are, since
it's not immediately obvious.

> +void free_detoast_iterator(DetoastIterator iter)
> +{
> + if (iter == NULL)
> + {
> + return;
> + }

If this function is going to do this, why do callers need to check for
NULL also? Seems pointless. I'd rather make callers simpler and keep
only the NULL-check inside the function, since this is not perf-critical
anyway.

> +extern void detoast_iterate(DetoastIterator detoast_iter)
> +{

Please, no "extern" in function definitions, only in prototypes in the
.h files. Also, we indent the function name at the start of line, with
the return type appearing on its own in the previous line.

> + if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> + elog(ERROR, "create_fetch_datum_itearator shouldn't be called for non-ondisk datums");

Typo for "iterator".

> + iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
> + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> + {
> [...]
> + }
> + else
> + {
> + iter->compressed = false;
> +
> + /* point the buffer directly at the raw data */
> + iter->buf = iter->fetch_datum_iterator->buf;
> + }

This arrangement where there are two ToastBuffers and they sometimes are
the same is cute, but I think we need a better way to know when each
needs to be freed afterwards; the proposed coding is confusing. And it
certainly it needs more than zero comments about what's going on there.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pierre Ducroquet 2019-09-03 20:38:06 Re: [Patch] Add a reset_computed_values function in pg_stat_statements
Previous Message Stephen Frost 2019-09-03 19:56:28 Re: Proposal: roll pg_stat_statements into core