Re: [proposal] de-TOAST'ing using a iterator

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Владимир Лесков <vladimirlesk(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [proposal] de-TOAST'ing using a iterator
Date: 2019-08-03 16:11:21
Message-ID: CAL-OGkstcrQ8bz_+Dt10Qc2SXeoNqpj-GvPmArmjRm4s8-Agrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> 于2019年8月2日周五 下午3:12写道:

> On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <djydewang(at)gmail(dot)com> wrote:
> >
> > John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> 于2019年7月29日周一 上午11:49写道:
> >>
> >> 1). For every needle comparison, text_position_next_internal()
> >> calculates how much of the value is needed and passes that to
> >> detoast_iterate(), which then calculates if it has to do something or
> >> not. This is a bit hard to follow. There might also be a performance
> >> penalty -- the following is just a theory, but it sounds plausible:
> >> The CPU can probably correctly predict that detoast_iterate() will
> >> usually return the same value it did last time, but it still has to
> >> call the function and make sure, which I imagine is more expensive
> >> than advancing the needle. Ideally, we want to call the iterator only
> >> if we have to.
> >>
> >> In the attached patch (applies on top of your v5),
> >> text_position_next_internal() simply compares hptr to the detoast
> >> buffer limit, and calls detoast_iterate() until it can proceed. I
> >> think this is clearer.
> >
> >
> > Yes, I think this is a general scenario where the caller continually
> > calls detoast_iterate until gets enough data, so I think such operations
> can
> > be extracted as a macro, as I did in patch v6. In the macro, the
> detoast_iterate
> > function is called only when the data requested by the caller is greater
> than the
> > buffer limit.
>
> I like the use of a macro here. However, I think we can find a better
> location for the definition. See the header comment of fmgr.h:
> "Definitions for the Postgres function manager and function-call
> interface." Maybe tuptoaster.h is as good a place as any?
>

PG_DETOAST_ITERATE isn't a sample function-call interface,
But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
similar to PG_DETOAST_ITERATE, make condition check first and then
decide whether to call the function. Besides, PG_DETOAST_DATUM,
PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE,
PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
to put all the de-TOAST interface together.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
> >> don't check it or do anything with it. Should we do something with it?
> >> It's also not yet clear if we should check the iterator state instead
> >> of return values. I've added some XXX comments as a reminder. We
> >> should also check the return value of pglz_decompress_iterate().
> >
> >
> > IMO, we need to provide users with a simple iterative interface.
> > Using the required data pointer to compare with the buffer limit is an
> easy way.
> > And the application scenarios of the iterator are mostly read operations.
> > So I think there is no need to return a value, and the iterator needs to
> throw an
> > exception for some wrong calls, such as all the data have been iterated,
> > but the user still calls the iterator.
>
> Okay, and see these functions now return void. The orignal
> pglz_decompress() returned a value that was check against corruption.
> Is there a similar check we can do for the iterative version?
>

As far as I know, we can just do such check after all compressed data is
decompressed.
If we are slicing, we can't do the check.

>
> >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
> >> pglz_decompress(), and I have some questions on it:
> >>
> >> a).
> >> + srcend = (const unsigned char *) (source->limit == source->capacity
> >> ? source->limit : (source->limit - 4));
> >>
> >> What does the 4 here mean in this expression?
> >
> >
> > Since we fetch chunks one by one, if we make srcend equals to the source
> buffer limit,
> > In the while loop "while (sp < srcend && dp < destend)", sp may exceed
> the source buffer limit and read unallocated bytes.
>
> Why is this? That tells me the limit is incorrect. Can the setter not
> determine the right value?
>

There are three statments change `sp` value in the while loop `while (sp <
srcend && dp < destend)`:
`ctrl = *sp++;`
`off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
`len += *sp++`
Although we make sure `sp` is less than `srcend` when enter while loop,
`sp` is likely to
go beyond the `srcend` in the loop, and we should ensure that `sp` is
always smaller than `buf->limit` to avoid
reading unallocated data. So, `srcend` can't be initialized to
`buf->limit`. Only one case is exceptional,
we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity',
it's imposisble to read unallocated
data via `sp`.

> Giving a four-byte buffer can prevent sp from exceeding the source buffer
> limit.
>
> Why 4? That's a magic number. Why not 2, or 27?
>

As I explained above, `sp` may go beyond the `srcend`in the loop, up to the
`srcend + 2`.
In theory, it's ok to set the buffer size to be greater than or equal 2.

> > If we have read all the chunks, we don't need to be careful to cross the
> border,
> > just make srcend equal to source buffer limit. I've added comments to
> explain it in patch v6.
>
> That's a good thing to comment on, but it doesn't explain why.

Yes, the current comment is puzzling. I'll improve it.

> This
> logic seems like a band-aid and I think a committer would want this to
> be handled in a more principled way.
>

I don't want to change pglz_decompress logic too much, the iterator should
pay more attention to saving and restoring the original pglz_decompress
state.

> >> Is it possible it's
> >> compensating for this bit in init_toast_buffer()?
> >>
> >> + buf->limit = VARDATA(buf->buf);
> >>
> >> It seems the initial limit should also depend on whether the datum is
> >> compressed, right? Can we just do this:
> >>
> >> + buf->limit = buf->position;
> >
> >
> > I'm afraid not. buf->position points to the data portion of the buffer,
> but the beginning of
> > the chunks we read may contain header information. For example, for
> compressed data chunks,
> > the first four bytes record the size of raw data, this means that limit
> is four bytes ahead of position.
> > This initialization doesn't cause errors, although the position is less
> than the limit in other cases.
> > Because we always fetch chunks first, then decompress it.
>
> I see what you mean now. This could use a comment or two to explain
> the stated constraints may not actually be satisfied at
> initialization.
>

Done.

> >> b).
> >> - while (sp < srcend && dp < destend)
> >> ...
> >> + while (sp + 1 < srcend && dp < destend &&
> >> ...
> >>
> >> Why is it here "sp + 1"?
> >
> >
> > Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch
> v6 to
> > achieve the purpose of parsing ctrl correctly every time.
>
> Please explain further. Was the "sp + 1" correct behavior (and why),
> or only for debugging setting ctrl/c correctly?

In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1`
before
entering the loop `while (sp < srcend && dp < destend)`, when entering the
loop
and read a control byte(sp equals to `srcend` now), the program can't enter
the
loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set
`iter->ctrlc` to 0,
exit the first loop and then this iteration is over. At the next iteration,
the control byte will be reread since `iter->ctrlc` equals to 0, but the
previous control byte
is not used. Changing the condition to `sp + 1 < srcend` avoid only one
control byte is read
then the iterator is over.

> Also, I don't think
> the new logic for the ctrl/c variables is an improvement:
>
> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
> which is confusing). Any time you initialize with something not 0 or
> 1, it's a magic number, and here it's far from where the loop variable
> is used. This is harder to read.
>

`iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at
the end of
the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value
is 0~7,
When `ctrlc` reaches 8, a control byte is read from the source
buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read
from the
source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be
intialized with '8'.

> 2. First time though the loop, iter->ctrlc = 8, which immediately gets
> set back to 0.
>

As I explained above, `iter->ctrlc = 8` make a control byte be read
from the source buffer to `ctrl` on the first iteration. Besides,
`iter->ctrlc = 8`
indicates that the valid value of `ctrlc` at the end of the last iteration
was not
recorded, Obviously, there are no other iterations before the first
iteration.

> 3. At the end of the loop, iter->ctrl/c are unconditionally set. In
> v5, there was a condition which would usually avoid this copying of
> values through pointers.
>

Patch v6 just records the value of `ctrlc` at the end of each iteration(or
loop)
whether it is valid (0~7) or 8, and initializes `ctrlc` on the next
iteration(or loop) correctly.
I think it is more concise in patch v6.

>
> >> 4. Note that varlena.c has a static state variable, and a cleanup
> >> function that currently does:
> >>
> >> static void
> >> text_position_cleanup(TextPositionState *state)
> >> {
> >> /* no cleanup needed */
> >> }
> >>
> >> It seems to be the detoast iterator could be embedded in this state
> >> variable, and then free-ing can happen here. That has a possible
> >> advantage that the iterator struct would be on the same cache line as
> >> the state data. That would also remove the need to pass "iter" as a
> >> parameter, since these functions already pass "state". I'm not sure if
> >> this would be good for other users of the iterator, so maybe we can
> >> hold off on that for now.
> >
> >
> > Good idea. I've implemented it in patch v6.
>
> That's better, and I think we can take it a little bit farther.
>
> 1. Notice that TextPositionState is allocated on the stack in
> text_position(), which passes both the "state" pointer and the "iter"
> pointer to text_position_setup(), and only then sets state->iter =
> iter. We can easily set this inside text_position(). That would get
> rid of the need for other callers to pass NULL iter to
> text_position_setup().
>

Done.

> 2. DetoastIteratorData is fixed size, so I see no reason to allocate
> it on the heap. We could allocate it on the stack in text_pos(), and
> pass the pointer to create_detoast_iterator() (in this case maybe a
> better name is init_detoast_iterator), which would return a bool to
> tell text_pos() whether to pass down the pointer or a NULL. The
> allocation of other structs (toast buffer and fetch iterator) probably
> can't be changed without more work.
>

Done

If there is anything else that is not explained clearly, please point it
out.

--
Best regards,
Binguo Bao

Attachment Content-Type Size
0001-de-TOASTing-using-a-iterator-7.patch text/x-patch 19.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-08-03 16:15:06 Re: Fix XML handling with DOCTYPE
Previous Message David Fetter 2019-08-03 15:56:04 Re: [Patch] Adding CORRESPONDING/CORRESPONDING BY to set operation