Re: Optimize partial TOAST decompression

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize partial TOAST decompression
Date: 2019-07-10 05:35:25
Message-ID: CAL-OGkt0WOUZnTDSrp3Zmy0GeFNPXai9aq7DYVe-=BxigUycCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> 于2019年7月10日周三 上午5:12写道:

> On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:
> >On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:
> >>Hi, Tomas!
> >>Thanks for your testing and the suggestion.
> >>
> >>That's quite bizarre behavior - it does work with a prefix, but not with
> >>>suffix. And the exact ERROR changes after the prefix query.
> >>
> >>
> >>I think bug is caused by "#2 0x00000000004c3b08 in
> >>heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
> >>slicelength=-1) at tuptoaster.c:315",
> >>since I ignore the case where slicelength is negative, and I've appended
> >>some comments for heap_tuple_untoast_attr_slice for the case.
> >>
> >>FWIW the new code added to this
> >>>function does not adhere to our code style, and would deserve some
> >>>additional explanation of what it's doing/why. Same for the
> >>>heap_tuple_untoast_attr_slice, BTW.
> >>
> >>
> >>I've added more comments to explain the code's behavior.
> >>Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
> >>"TOAST_COMPRESS_DATA" since
> >>it is used to get toast compressed data rather than raw data.
> >>
> >
> >Thanks, this seems to address the issue - I can no longer reproduce the
> >crashes, allowing the benchmark to complete. I'm attaching the script I
> >used and spreadsheet with a summary of results.
> >
> >For the cases I've tested (100k - 10M values, different compressibility,
> >different prefix/length values), the results are kinda mixed - many
> >cases got much faster (~2x), but other cases got slower too. We're
> >however talking about queries taking a couple of miliseconds, so in
> >absolute numbers the differences are small.
> >
> >That does not mean the optimization is useless - but the example shared
> >at the beginning of this thread is quite extreme, as the values are
> >extremely compressible. Each value is ~38MB (in plaintext), but a table
> >with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
> >which I think is not typical for real-world data sets.
> >
> >The data I've used are less extreme, depending on the fraction of random
> >data in values.
> >
> >I went through the code too. I've reworder a couple of comments and code
> >style issues, but there are a couple of more serious issues.
> >
> >
> >1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?
> >
> >This seems unnecessary, and it discards the clear hint that it's about
> >accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
> >IMHO we should keep the original naming.
> >
> >
> >2) pglz_maximum_compressed_size signatures are confusing
> >
> >There are two places with pglz_maximum_compressed_size signature, and
> >those places are kinda out of sync when it comes to parameter names:
> >
> > int32
> > pglz_maximum_compressed_size(int32 raw_slice_size,
> > int32 total_compressed_size)
> >
> > extern
> > int32 pglz_maximum_compressed_size(int32 raw_slice_size,
> > int32 raw_size);
> >
> >Also, pg_lzcompress.c has no concept of a "slice" because it only deals
> >with simple compression, slicing is responsibility of the tuptoaster. So
> >we should not mix those two, not even in comments.
> >
> >
> >I propose tweaks per the attached patch - I think it makes the code
> >clearer, and it's mostly cosmetic stuff. But I haven't tested the
> >changes beyond "it compiles".
> >
> >
> >regards
> >
>
> FWIW I've done another round of tests with larger values (up to ~10MB)
> and the larger the values the better the speedup (a bit as expected).
> Attached is the script I used and a spreasheet with a summary.
>
> We still need a patch addressing the review comments, though.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Hi, Tomas!
Sorry for the late reply.
Thank you for further testing, I am trying to reproduce your first test
summary,
since I think the performance of the patch will not drop in almost all
cases.

Besides, If a value is composed of random characters,
it will be hard to be compressed, because pglz requires a 25% compression
ratio by default or not worth it.
This means that querying the value will not trigger the patch. But the
first test results show that the patch
is slower than the master when the value is composed of random characters,
which is confusing.

From the second test result, we can infer that the first test result
was indeed affected by a random disturbance in the case of a small
time-consuming.

> We still need a patch addressing the review comments, though.
done:)

Attachment Content-Type Size
0001-Optimize-partial-TOAST-decompression-6.patch text/x-patch 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Barwick 2019-07-10 05:35:56 doc: minor update for description of "pg_roles" view
Previous Message Amit Langote 2019-07-10 05:30:15 Re: Postgres 11: Table Partitioning and Primary Keys