Re: Variable length varlena headers redux

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variable length varlena headers redux
Date: 2007-02-09 05:35:08
Message-ID: 200702090535.l195Z8C07463@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:
>
> Uh, I thought the approach was to create type-specific in/out functions,
> and add casting so every time there were referenced, they would expand
> to a varlena structure in memory.

Oh, one more thing. You are going to need to teach the code that walks
through a tuple attributes about the short header types. I think you
should set pg_type.typlen = -3 (vs -1 for varlena) and put your macro
code there too. (As an example, see the macro att_addlength().)

I know it is kind of odd to have a data type that is only used on disk,
and not in memory, but I see this as a baby varlena type, used only to
store and get varlena values using less disk space.

---------------------------------------------------------------------------
>
> Gregory Stark wrote:
> >
> > I've been looking at this again and had a few conversations about it. This may
> > be easier than I had originally thought but there's one major issue that's
> > bugging me. Do you see any way to avoid having every user function everywhere
> > use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
> >
> > The two approaches I see are either
> >
> > a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> > for constructing new tuples and behaves exactly as it does now. So you always
> > construct a four-byte header datum. Then in heap_form*tuple we check if you
> > can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> > existing datums and would interpret the header bits.
> >
> > This seems very fragile since one stray call site using VARATT_DATA to find
> > the data in an existing datum would cause random bugs that only occur rarely
> > in certain circumstances. It would even work as long as the size is filled in
> > with VARATT_SIZEP first which it usually is, but fail if someone changes the
> > order of the statements.
> >
> > or
> >
> > b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> > everywhere change over to a new macro api. That seems like a pretty big
> > burden. It's safer but means every contrib module would have to be updated and
> > so on.
> >
> > I'm hoping I'm missing something and there's a way to do this without breaking
> > the api for every user function.
> >
> >
>
> -- Start of included mail From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>
> > To: Gregory Stark <stark(at)enterprisedb(dot)com>
> > cc: Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>,
> > Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org,
> > Martijn van Oosterhout <kleptog(at)svana(dot)org>
> > Subject: Re: [HACKERS] Fixed length data types issue
> > Date: Mon, 11 Sep 2006 13:15:43 -0400
> > Lines: 64
> > Xref: stark.xeocode.com work.enterprisedb:683
>
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > In any case it seems a bit backwards to me. Wouldn't it be better to
> > > preserve bits in the case of short length words where they're precious
> > > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...
> >
> > Well, I don't find that real persuasive: you're saying that it's
> > important to have a 1-byte not 2-byte header for datums between 64 and
> > 127 bytes long. Which is by definition less than a 2% savings for those
> > values. I think its's more important to pick bitpatterns that reduce
> > the number of cases heap_deform_tuple has to think about while decoding
> > the length of a field --- every "if" in that inner loop is expensive.
> >
> > I realized this morning that if we are going to preserve the rule that
> > 4-byte-header and compressed-header cases can be distinguished from the
> > data alone, there is no reason to be very worried about whether the
> > 2-byte cases can represent the maximal length of an in-line datum.
> > If you want to do 16K inline (and your page is big enough for that)
> > you can just fall back to the 4-byte-header case. So there's no real
> > disadvantage if the 2-byte headers can only go up to 4K or so. This
> > gives us some more flexibility in the bitpattern choices.
> >
> > Another thought that occurred to me is that if we preserve the
> > convention that a length word's value includes itself, then for a
> > 1-byte header the bit pattern 10000000 is meaningless --- the count
> > has to be at least 1. So one trick we could play is to take over
> > this value as the signal for "toast pointer follows", with the
> > assumption that the tuple-decoder code knows a-priori how big a
> > toast pointer is. I am not real enamored of this, because it certainly
> > adds one case to the inner heap_deform_tuple loop and it'll give us
> > problems if we ever want more than one kind of toast pointer. But
> > it's a possibility.
> >
> > Anyway, a couple of encodings that I'm thinking about now involve
> > limiting uncompressed data to 1G (same as now), so that we can play
> > with the first 2 bits instead of just 1:
> >
> > 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> > 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G)
> > 100xxxxx 1-byte length word, unaligned, TOAST pointer
> > 1010xxxx 2-byte length word, unaligned, uncompressed data (up to 4K)
> > 1011xxxx 2-byte length word, unaligned, compressed data (up to 4K)
> > 11xxxxxx 1-byte length word, unaligned, uncompressed data (up to 63b)
> >
> > or
> >
> > 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> > 010xxxxx 2-byte length word, unaligned, uncompressed data (up to 8K)
> > 011xxxxx 2-byte length word, unaligned, compressed data (up to 8K)
> > 10000000 1-byte length word, unaligned, TOAST pointer
> > 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b)
> > (xxxxxxx not all zero)
> >
> > This second choice allows longer datums in both the 1-byte and 2-byte
> > header formats, but it hardwires the length of a TOAST pointer and
> > requires four cases to be distinguished in the inner loop; the first
> > choice only requires three cases, because TOAST pointer and 1-byte
> > header can be handled by the same rule "length is low 6 bits of byte".
> > The second choice also loses the ability to store in-line compressed
> > data above 8K, but that's probably an insignificant loss.
> >
> > There's more than one way to do it ...
> >
> > regards, tom lane
> >
> -- End of included mail.
>
> >
> >
> > --
> > Gregory Stark
> > EnterpriseDB http://www.enterprisedb.com
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2007-02-09 05:38:37 Re: Proposal: Commit timestamp
Previous Message Pavel Stehule 2007-02-09 05:22:03 Re: better support of out parameters in plperl