Re: Something's been bugging me

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Something's been bugging me
Date: 2007-09-29 15:20:30
Message-ID: 87fy0xv6pt.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> I'm wondering whether it doesn't make sense to lower VARATT_SHORT_MAX to 0x70
>> to allow for at least a small number of constant values which could indicate
>> some special type of datum. That could be used to indicate that a fixed size
>> pointer like a toast pointer follows. That could be used for something like
>> common value compression. [*]
>
> I'm not for this because it would complicate the already-too-complicated
> inner-loop tests for deciding which form of datum you're looking at.
>
> The idea that I recall mentioning was to expend another byte in TOAST
> pointers to make them self-identifying, ie, instead of 0x80 or 0x01
> signaling something that *must* be a 17-byte toast pointer, that bit
> pattern signals "something else" and the content of the next byte
> lets you know what. So TOAST pointers would take 18 bytes instead of
> 17, and there would be room for additions of other sorts of pointers.

Hm, wouldn't that be just as expensive though? You would still have to look at
the next byte and check it against various values to see what length to skip
over. Hm, unless we put the length in the following byte. Also the difference
between (first-byte ^ 0x80 < 0x70) and (first-byte & 0x80 == 0x80) seems like
it's going to be pretty slight.

I suppose we don't have to decide now. We could just put a 1-byte padding byte
containing 0 (or 17 or 18, though I think 0 is safest) at the front of the
toast pointer structure for now -- we don't have to actually check what it
contains yet.

For that matter we could lower VARATT_SHORT_MAX so we don't generate any short
varlenas over 0x70 in length but not actually check for them in VARATT_IS_1B()
yet either.

The choice of strategy might depend on what we're trying to encode in there. I
was picturing using a single byte for up to 256 common values and for that it
might be unfortunate if we need two more bytes of overhead. On the other hand
something else I was pondering was doing some form of lz compression using
some global dictionary in which case one more byte is not going to matter at
all.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-09-29 15:41:39 CLUSTER doesn't check indisvalid etc
Previous Message Tom Lane 2007-09-29 15:01:33 Re: Something's been bugging me