Re: Optimize partial TOAST decompression

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize partial TOAST decompression
Date: 2019-07-05 18:27:56
Message-ID: CAL-OGkuvHceCv96H8PHk+GeDn5NxuojGhQQati7Z7uFHduZVBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> 于2019年7月5日周五 上午1:46写道:

> I've done a bit of testing and benchmaring on this patch today, and
> there's a bug somewhere, making it look like there are corrupted data.
>
> What I'm seeing is this:
>
> CREATE TABLE t (a text);
>
> -- attached is data for one row
> COPY t FROM '/tmp/t.data';
>
>
> SELECT length(substr(a,1000)) from t;
> psql: ERROR: compressed data is corrupted
>
> SELECT length(substr(a,0,1000)) from t;
> length
> --------
> 999
> (1 row)
>
> SELECT length(substr(a,1000)) from t;
> psql: ERROR: invalid memory alloc request size 2018785106
>
> That's quite bizarre behavior - it does work with a prefix, but not with
> suffix. And the exact ERROR changes after the prefix query. (Of course,
> on master it works in all cases.)
>
> The backtrace (with the patch applied) looks like this:
>
> #0 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
> #1 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
> #2 0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized
> out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315
> #3 0x000000000085c1e5 in pg_detoast_datum_slice (datum=<optimized out>,
> first=<optimized out>, count=<optimized out>) at fmgr.c:1767
> #4 0x0000000000833b7a in text_substring (str=133761519127512, start=0,
> length=<optimized out>, length_not_specified=<optimized out>) at
> varlena.c:956
> ...
>
> I've only observed this with a very small number of rows (the data is
> generated randomly with different compressibility etc.), so I'm only
> attaching one row that exhibits this issue.
>
> My guess is toast_fetch_datum_slice() gets confused by the headers or
> something, or something like that. FWIW the new code added to this
> function does not adhere to our code style, and would deserve some
> additional explanation of what it's doing/why. Same for the
> heap_tuple_untoast_attr_slice, BTW.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with
> suffix. And the exact ERROR changes after the prefix query.

I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this
> function does not adhere to our code style, and would deserve some
> additional explanation of what it's doing/why. Same for the
> heap_tuple_untoast_attr_slice, BTW.

I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.

Best Regards, Binguo Bao.

Attachment Content-Type Size
0001-Optimize-partial-TOAST-decompression-5.patch text/x-patch 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-05 18:48:45 Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
Previous Message Paul A Jungwirth 2019-07-05 17:59:26 Re: range_agg