Re: Variable length varlena headers redux

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variable length varlena headers redux
Date: 2007-02-09 03:58:07
Message-ID: 200702090358.l193w7v02893@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Uh, I thought the approach was to create type-specific in/out functions,
and add casting so every time there were referenced, they would expand
to a varlena structure in memory.

---------------------------------------------------------------------------

Gregory Stark wrote:
>
> I've been looking at this again and had a few conversations about it. This may
> be easier than I had originally thought but there's one major issue that's
> bugging me. Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
>
> The two approaches I see are either
>
> a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> for constructing new tuples and behaves exactly as it does now. So you always
> construct a four-byte header datum. Then in heap_form*tuple we check if you
> can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> existing datums and would interpret the header bits.
>
> This seems very fragile since one stray call site using VARATT_DATA to find
> the data in an existing datum would cause random bugs that only occur rarely
> in certain circumstances. It would even work as long as the size is filled in
> with VARATT_SIZEP first which it usually is, but fail if someone changes the
> order of the statements.
>
> or
>
> b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> everywhere change over to a new macro api. That seems like a pretty big
> burden. It's safer but means every contrib module would have to be updated and
> so on.
>
> I'm hoping I'm missing something and there's a way to do this without breaking
> the api for every user function.
>
>

-- Start of included mail From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> To: Gregory Stark <stark(at)enterprisedb(dot)com>
> cc: Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>,
> Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org,
> Martijn van Oosterhout <kleptog(at)svana(dot)org>
> Subject: Re: [HACKERS] Fixed length data types issue
> Date: Mon, 11 Sep 2006 13:15:43 -0400
> Lines: 64
> Xref: stark.xeocode.com work.enterprisedb:683

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > In any case it seems a bit backwards to me. Wouldn't it be better to
> > preserve bits in the case of short length words where they're precious
> > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...
>
> Well, I don't find that real persuasive: you're saying that it's
> important to have a 1-byte not 2-byte header for datums between 64 and
> 127 bytes long. Which is by definition less than a 2% savings for those
> values. I think its's more important to pick bitpatterns that reduce
> the number of cases heap_deform_tuple has to think about while decoding
> the length of a field --- every "if" in that inner loop is expensive.
>
> I realized this morning that if we are going to preserve the rule that
> 4-byte-header and compressed-header cases can be distinguished from the
> data alone, there is no reason to be very worried about whether the
> 2-byte cases can represent the maximal length of an in-line datum.
> If you want to do 16K inline (and your page is big enough for that)
> you can just fall back to the 4-byte-header case. So there's no real
> disadvantage if the 2-byte headers can only go up to 4K or so. This
> gives us some more flexibility in the bitpattern choices.
>
> Another thought that occurred to me is that if we preserve the
> convention that a length word's value includes itself, then for a
> 1-byte header the bit pattern 10000000 is meaningless --- the count
> has to be at least 1. So one trick we could play is to take over
> this value as the signal for "toast pointer follows", with the
> assumption that the tuple-decoder code knows a-priori how big a
> toast pointer is. I am not real enamored of this, because it certainly
> adds one case to the inner heap_deform_tuple loop and it'll give us
> problems if we ever want more than one kind of toast pointer. But
> it's a possibility.
>
> Anyway, a couple of encodings that I'm thinking about now involve
> limiting uncompressed data to 1G (same as now), so that we can play
> with the first 2 bits instead of just 1:
>
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G)
> 100xxxxx 1-byte length word, unaligned, TOAST pointer
> 1010xxxx 2-byte length word, unaligned, uncompressed data (up to 4K)
> 1011xxxx 2-byte length word, unaligned, compressed data (up to 4K)
> 11xxxxxx 1-byte length word, unaligned, uncompressed data (up to 63b)
>
> or
>
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> 010xxxxx 2-byte length word, unaligned, uncompressed data (up to 8K)
> 011xxxxx 2-byte length word, unaligned, compressed data (up to 8K)
> 10000000 1-byte length word, unaligned, TOAST pointer
> 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b)
> (xxxxxxx not all zero)
>
> This second choice allows longer datums in both the 1-byte and 2-byte
> header formats, but it hardwires the length of a TOAST pointer and
> requires four cases to be distinguished in the inner loop; the first
> choice only requires three cases, because TOAST pointer and 1-byte
> header can be handled by the same rule "length is low 6 bits of byte".
> The second choice also loses the ability to store in-line compressed
> data above 8K, but that's probably an insignificant loss.
>
> There's more than one way to do it ...
>
> regards, tom lane
>
-- End of included mail.

>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Koichi Suzuki 2007-02-09 04:00:10 Re: Archive log compression keeping physical log available in the crash recovery
Previous Message Simon Riggs 2007-02-09 03:51:33 Re: [PATCHES] [pgsql-patches] Phantom CommandIDs,updated patch