Re: Optimize partial TOAST decompression

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Binguo Bao <djydewang(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize partial TOAST decompression
Date: 2019-07-09 21:12:19
Message-ID: 20190709211219.hbo6tyfxxcy2ek3j@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:
>On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:
>>Hi, Tomas!
>>Thanks for your testing and the suggestion.
>>
>>That's quite bizarre behavior - it does work with a prefix, but not with
>>>suffix. And the exact ERROR changes after the prefix query.
>>
>>
>>I think bug is caused by "#2 0x00000000004c3b08 in
>>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
>>slicelength=-1) at tuptoaster.c:315",
>>since I ignore the case where slicelength is negative, and I've appended
>>some comments for heap_tuple_untoast_attr_slice for the case.
>>
>>FWIW the new code added to this
>>>function does not adhere to our code style, and would deserve some
>>>additional explanation of what it's doing/why. Same for the
>>>heap_tuple_untoast_attr_slice, BTW.
>>
>>
>>I've added more comments to explain the code's behavior.
>>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
>>"TOAST_COMPRESS_DATA" since
>>it is used to get toast compressed data rather than raw data.
>>
>
>Thanks, this seems to address the issue - I can no longer reproduce the
>crashes, allowing the benchmark to complete. I'm attaching the script I
>used and spreadsheet with a summary of results.
>
>For the cases I've tested (100k - 10M values, different compressibility,
>different prefix/length values), the results are kinda mixed - many
>cases got much faster (~2x), but other cases got slower too. We're
>however talking about queries taking a couple of miliseconds, so in
>absolute numbers the differences are small.
>
>That does not mean the optimization is useless - but the example shared
>at the beginning of this thread is quite extreme, as the values are
>extremely compressible. Each value is ~38MB (in plaintext), but a table
>with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
>which I think is not typical for real-world data sets.
>
>The data I've used are less extreme, depending on the fraction of random
>data in values.
>
>I went through the code too. I've reworder a couple of comments and code
>style issues, but there are a couple of more serious issues.
>
>
>1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?
>
>This seems unnecessary, and it discards the clear hint that it's about
>accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
>IMHO we should keep the original naming.
>
>
>2) pglz_maximum_compressed_size signatures are confusing
>
>There are two places with pglz_maximum_compressed_size signature, and
>those places are kinda out of sync when it comes to parameter names:
>
> int32
> pglz_maximum_compressed_size(int32 raw_slice_size,
> int32 total_compressed_size)
>
> extern
> int32 pglz_maximum_compressed_size(int32 raw_slice_size,
> int32 raw_size);
>
>Also, pg_lzcompress.c has no concept of a "slice" because it only deals
>with simple compression, slicing is responsibility of the tuptoaster. So
>we should not mix those two, not even in comments.
>
>
>I propose tweaks per the attached patch - I think it makes the code
>clearer, and it's mostly cosmetic stuff. But I haven't tested the
>changes beyond "it compiles".
>
>
>regards
>

FWIW I've done another round of tests with larger values (up to ~10MB)
and the larger the values the better the speedup (a bit as expected).
Attached is the script I used and a spreasheet with a summary.

We still need a patch addressing the review comments, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
random-bench.sh application/x-sh 2.3 KB
results-large.ods application/vnd.oasis.opendocument.spreadsheet 100.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-09 21:22:27 Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
Previous Message Alvaro Herrera 2019-07-09 21:06:45 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)