Quick Links

Re: Optimize partial TOAST decompression

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Binguo Bao <djydewang(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Optimize partial TOAST decompression
Date:	2019-09-30 15:56:07
Message-ID:	20190930155607.5tnyzksijgm4imbg@development
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Sep 27, 2019 at 01:00:36AM +0200, Tomas Vondra wrote:
>On Wed, Sep 25, 2019 at 05:38:34PM -0300, Alvaro Herrera wrote:
>>Hello, can you please update this patch?
>>
>
>I'm not the patch author, but I've been looking at the patch recently
>and I have a rebased version at hand - so attached.
>
>FWIW I believe the patch is solid and in good shape, and it got stuck
>after I reported some benchmarks showing somewhat flaky performance. I
>know Binguo Bao was trying to reproduce the benchmark, and I assume the
>silence means he was not successful :-(
>
>On the larger data set the patch however performed very nicely, so maybe
>I just did something stupid while running the smaller one, or maybe it's
>just noise (the queries were just a couple of ms in that test). I do
>plan to rerun the benchmarks and investigate a bit - if I find the patch
>is fine, I'd like to commit it shortly.
>

OK, I was just about to push this after polishing it a bit, but then I
noticed the patch does not address one of the points from Paul' review,
asking for comment explaining the pglz_maximum_compressed_size formula.

I mean this:

/*
* Use int64 to prevent overflow during calculation.
*/
compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

#define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?

Regarding benchmarks - as I mentioned, I've repeated the tests and
everything seems fine. The results from the two usual machines are
available in [1]. There are only very few noise-level regressions and
many very significant speedups.

I have a theory what went wrong in the first run that showed some
regressions - it's possible the machine had CPU power management
enabled. I can't check this retroactively, but it'd explain variability
for short queries, and smooth behavior on longer queries.

[1] https://github.com/tvondra/toast-optimize

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Optimize partial TOAST decompression at 2019-09-26 23:00:36 from Tomas Vondra

Responses

Re: Optimize partial TOAST decompression at 2019-09-30 16:20:22 from Andrey Borodin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2019-09-30 16:09:30	Re: SQL/JSON: JSON_TABLE
Previous Message	Magnus Hagander	2019-09-30 14:59:00	Re: Online checksums patch - once again