Re: A varint implementation for PG?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Subject: Re: A varint implementation for PG?
Date: 2021-08-05 13:18:11
Message-ID: 48857269-e536-0083-e3b4-3c4a9323be80@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 8/4/21 7:21 PM, Tomas Vondra wrote:
> On 8/5/21 1:05 AM, Andres Freund wrote:
>
>>
>>> The first one seems quite efficient in how it encodes the length
>>> into very
>>> few bits (which matters especially for small values). It's designed for
>>> integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary
>>> lengths
>>> fairly easily, I think:
>>
>>> Look at the first byte, and
>>>
>>> 0 - 243 - encoded as is
>>> 244 - 1 byte
>>> 245 - 2 bytes
>>> 246 - 3 bytes
>>> 247 - 4 bytes
>>> 248 - 5 bytes
>>> 249 - 6 bytes
>>> 250 - 7 bytes
>>> 251 - 8 bytes
>>> 252 - next 1 byte is length
>>> 253 - next 2 bytes are length
>>> 254 - next 3 bytes are length
>>> 255 - next 4 bytes are length
>>
>>> If we want to support longer lengths, we'd have to reserve an extra
>>> value
>>> (which reduces the number of values that require a single byte).
>>
>> I think that's not a bad scheme. I think it may end up being a bit more
>> expensive to decode because you need more branches instead of using
>> find-first-set type instructions.
>>
>
> I don't think it requires many branches, because you can essentially do
>
>   if (byte[0] <= 243)
>     length = 0
>   else if (byte[0] <= 251)
>     length = byte[0] - 243
>   else
>   {
>     length_bytes = byte[0] - 251;
>     ... read length_bytes, decode length
>   }
>
> but I haven't tried implementing it and maybe my intuition is wrong.
>
> Or maybe it'd be a good scheme for on-disk format, but poor for memory.
>
>
>

This seems like quite an elegant scheme. Certainly worth trying out. I
find it hard to believe that more than 4 length bytes would be needed
(although that reminds me of the famous and possibly apocryphal quote
"640K ought to be enough for anybody.")

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2021-08-05 13:28:37 Re: Lowering the ever-growing heap->pd_lower
Previous Message Tomas Vondra 2021-08-05 13:06:54 Re: Commitfest overflow