Re: Variable length varlena headers redux

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variable length varlena headers redux
Date: 2007-02-09 11:33:04
Message-ID: 87mz3nin0v.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Bruce Momjian <bruce(at)momjian(dot)us> writes:

> Bruce Momjian wrote:
> >
> > Uh, I thought the approach was to create type-specific in/out functions,
> > and add casting so every time there were referenced, they would expand
> > to a varlena structure in memory.

Are you talking about actual casts? Because that would lead to all kinds of
strange places with indexes and function lookups and so on. Or are you just
talking about code in the macro api to datum?

> Oh, one more thing. You are going to need to teach the code that walks
> through a tuple attributes about the short header types. I think you
> should set pg_type.typlen = -3 (vs -1 for varlena) and put your macro
> code there too. (As an example, see the macro att_addlength().)

I thought of doing this. It would let us, for example, treat text/varchar,
bpchar, and numeric but leave other data types unchanged.

That does help somewhat but unfortunately text is the problem case. There's
tons of code that generates text without using textin. All of pgcrypto for
example.

> I know it is kind of odd to have a data type that is only used on disk,
> and not in memory, but I see this as a baby varlena type, used only to
> store and get varlena values using less disk space.

I was leaning toward generating the short varlena headers primarily in
heap_form*tuple and just having the datatype specific code generate 4-byte
headers much as you describe.

However that doesn't get us away from having VARDATA/VARSIZE aware of the new
headers. Since heap_deform*tuple and the other entry points which extract
individual attributes return pointers to the datum in the tuple. They can't
expand the header to a 4-byte header on the fly.

I thought of doing it in DETOAST_DATUM on the theory that everyone's going to
be calling it on their arguments. However there are other cases than just
arguments. Other functions might call, say, text_concat() and then call
VARDATA() on the result.

Even if we only ever generate short headers on heap_form*tuple and always
expand them on DETOAST we could have code that passes around tuples that it
"knows" are entirely in memory and therefore not toasted. I'm thinking of
plpgsql here primarily. Perhaps it would be enough to outlaw this behaviour
but it still seems sort of fragile to me.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2007-02-09 11:36:25 Re: pgsql: Add lock matrix to documentation.
Previous Message Heikki Linnakangas 2007-02-09 11:20:03 Re: [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch