Re: Enum proposal / design

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tom Dunstan <pgsql(at)tomd(dot)cc>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enum proposal / design
Date: 2006-08-16 17:33:18
Message-ID: 26936.1155749598@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Dunstan <pgsql(at)tomd(dot)cc> writes:
> Tom Lane wrote:
>> I think this is excessive concern for bit-shaving. Make the on-disk
>> representation be 8 bytes instead of 4, then you can store the OID
>> directly and have no need for the separate identifier concept.

> That's all true. It's a bit depressing to think that IMO 99% of users of
> this will have enum values whose range would fit into 1 byte, but we'll
> be using 8 to store it on disk. I had convinced myself that 4 was ok on
> the basis that alignment issues in surrounding columns would pad out the
> remaining bits anyway much of the time.

Right, and on a 64-bit machine the same frequently holds at the 8-byte
level, so it's not real clear how much you're saving.

> Ok, I'll run one more idea up the flagpole before giving up on a 4 byte
> on disk representation. :) How about assigning a unique 4 byte id to
> each enum value, and storing that on disk. This would be unique across
> the database, not per enum type. The structure of pg_enum would be a bit
> different, as the per-type enum id would be gone, and there would be
> multiple rows for each enum type. The columns would be: the type oid,
> the associated unique id and the textual representation.

That seems not a bad idea. I had been considering complaining that the
array-based catalog structure was denormalized, but refrained ... I like
the fact that this approach makes it normalized.

Another thought is that this isn't really tied to any particular width
of stored enum values. You could easily imagine a compile time switch
to say you want 2-byte enums instead of 4. Or 8; or even 1.

Even more radical: do it at runtime. You could assign the typlen
(stored width) of an enum type at creation time on the basis of the
largest identifier it contains. This might be a bit too weird because
enums created earlier would have a size advantage over those created
later, but if you are looking to shave space ...

That reminds me: were you intending to allow an ALTER ENUM operation
to add (or remove, or rename) elements of an enum type? The above
method would fail for the case where an ADD operation needed to assign
an identifier wider than the type allowed for.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-08-16 17:55:25 Re: Enum proposal / design
Previous Message Tom Dunstan 2006-08-16 17:17:18 Re: Enum proposal / design