Re: Optimize partial TOAST decompression

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Binguo Bao <djydewang(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize partial TOAST decompression
Date: 2019-10-01 06:20:39
Message-ID: 123EF56B-F8EC-4868-B49B-095795095E7A@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 30 сент. 2019 г., в 22:29, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>
> On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:
>>
>>
>>> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>>
>>> I mean this:
>>>
>>> /*
>>> * Use int64 to prevent overflow during calculation.
>>> */
>>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;
>>>
>>> I'm not very familiar with pglz internals, but I'm a bit puzzled by
>>> this. My first instinct was to compare it to this:
>>>
>>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)
>>>
>>> but clearly that's a very different (much simpler) formula. So why
>>> shouldn't pglz_maximum_compressed_size simply use this macro?
>
>>
>> compressed_size accounts for possible increase of size during
>> compression. pglz can consume up to 1 control byte for each 8 bytes of
>> data in worst case.
>
> OK, but does that actually translate in to the formula? We essentially
> need to count 8-byte chunks in raw data, and multiply that by 9. Which
> gives us something like
>
> nchunks = ((rawsize + 7) / 8) * 9;
>
> which is not quite what the patch does.

I'm afraid neither formula is correct, but all this is hair-splitting differences.

Your formula does not account for the fact that we may not need all bytes from last chunk.
Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total
But nchunks = 9.

Binguo's formula is appending 1 control bit per data byte and one extra control byte.
Consider size = 8 bytes. We need 1 control byte, 8 literals, 9 total.
But compressed_size = 10.

Mathematically correct formula is
compressed_size = (int32) ((int64) rawsize * 9 + 7) / 8;
Here we take one bit for each data byte, and 7 control bits for overflow.

But this equations make no big difference, each formula is safe. I'd pick one which is easier to understand and document (IMO, its nchunks = ((rawsize + 7) / 8) * 9).

Thanks!

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-10-01 06:20:59 Re: pgbench - allow to create partitioned tables
Previous Message Andrew Gierth 2019-10-01 05:55:32 Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence