Re: Fixed length data types issue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: Fixed length data types issue
Date: 2006-09-10 23:59:58
Message-ID: 5604.1157932798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> Either way, I think it would be interesting to consider
>>
>> (a) length word either one or two bytes, not four. You can't need more
>> than 2 bytes for a datum that fits in a disk page ...

> That is an interesting observation, though could compressed inline
> values exceed two bytes?

After expansion, perhaps, but it's the on-disk footprint that concerns
us here.

I thought a bit more about this and came up with a zeroth-order sketch:

The "length word" for an on-disk datum could be either 1 or 2 bytes;
in the 2-byte case we'd need to be prepared to fetch the bytes
separately to avoid alignment issues. The high bits of the first byte
say what's up:

* First two bits 00: 2-byte length word, uncompressed inline data
follows. This allows a maximum on-disk size of 16K for an uncompressed
datum, so we lose nothing at all for standard-size disk pages and not
much for 32K pages (remember the toaster will try to compress any tuple
exceeding 1/4 page anyway ... this just makes it mandatory).

* First two bits 01: 2-byte length word, compressed inline data
follows. Again, hard limit of 16K, so if your data exceeds that you
have to push it out to the toast table. Again, this policy costs zero
for standard size disk pages and not much for 32K pages.

* First two bits 10: 1-byte length word, zero to 62 bytes of
uncompressed inline data follows. This is the case that wins for short
values.

* First two bits 11: 1-byte length word, pointer to out-of-line toast
data follows. We may as well let the low 6 bits of the length word be
the size of the toast pointer, same as it works now. Since the toast
pointer is not guaranteed aligned anymore, we'd have to memcpy it
somewhere before using it ... but compared to the other costs of
fetching a toast value, that's surely down in the noise. The
distinction between compressed and uncompressed toast data would need to
be indicated in the body of the toast pointer, not in the length word as
today, but nobody outside of tuptoaster.c would care.

Notice that heap_deform_tuple only sees 2 cases here: high bit 0 means
2-byte length word, high bit 1 means 1-byte. It doesn't care whether
the data is compressed or toasted, same as today.

There are other ways we could divvy up the bit assignments of course.
The main issue is keeping track of whether any given Datum is in this
compressed-for-disk format or in the uncompressed 4-byte-length-word
format.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-09-11 00:25:17 Re: contrib uninstall scripts need some love
Previous Message Bruce Momjian 2006-09-10 23:57:32 Lock partitions