Re: Packed short varlenas, what next?

From: Gregory Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Packed short varlenas, what next?
Date: 2007-02-27 15:16:46
Message-ID: 87zm6zeild.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > As I has mentioned earlier, I'm missing a plan to allow 8-byte varlena
> > sizes.

Hm, change VARHDRSZ to 8 and change all the varlena data types to have an
int64 leading field? I suppose it could be done, and it would give us more
bits to play with in the codespace since then we could limit 4-byte headers to
128M or something. But yes, there are tons of places in the code that
currently do arithmetic on sizes using integers -- and often signed integers
at that.

But that's a change to what a *detoasted* datum looks like. My patch mainly
changes what a *toasted* datum looks like. (Admittedly after making more data
fall in that category than previously.) The only change to a detoasted datum
is that the size is stored in network byte order.

> For the moment I think it should be enough to expect that the patch
> allow for more than one format of TOAST pointer, so that if we ever did
> try to support 8-byte varlenas, there'd be a way to represent them
> on-disk. Some of the alternatives that we discussed last year used up
> all of the "prefix space" and wouldn't have allowed expansion in this
> particular direction.

Ah yes, I had intended to include the bit-pattern choice in the list as well.

There are two issues there:

1) The lack of 2-byte patterns which is quite annoying as really *any* on-disk
datum would fit in a 2-byte header varlena. However it became quite tricky
to convert things to 2-byte headers, especially for compressed data, it
would have made for a much bigger patch to tuptoaster.c and pg_lzcompress.
And I became convinced that it was best to get the most important gain
first, saving 2 bytes on wider tuples is less important than 3-6 bytes on
narrow tuples.

2) The choice of encoding for toast pointers. Note that currently they don't
actually save *any* space due to the alignment requirements of the OIDs.
which seems kind of silly but I didn't see any reasonable way around that.
The flip side is that gives us 24 bits to play with if we want to have
different types of external pointers or more meta-information about the
toasted data.

One of the details here is that I didn't store the compressed bit anywhere
for external toast pointers. I just made the macro compare the rawsize and
extsize. If that strikes anyone as evil we could take a byte out of those 3
padding bytes for flags and store a compressed flag there.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-27 15:52:05 Re: Packed short varlenas, what next?
Previous Message Tom Lane 2007-02-27 14:50:22 Re: Packed short varlenas, what next?