Re: Enum proposal / design

From: Tom Dunstan <pgsql(at)tomd(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enum proposal / design
Date: 2006-08-16 17:17:18
Message-ID: 44E3531E.4090103@tomd.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Tom Dunstan <pgsql(at)tomd(dot)cc> writes:
>>On disk, enums will occupy 4 bytes: the high 22 bits will be an enum
>>identifier, with the bottom 10 bits being the enum value. This allows
>>1024 values for a given enum, and 2^22 different enum types, both of
>>which should be heaps. The exact distribution of bits doesn't matter all
>>that much, we just picked some that we were comfortable with.
>
>
> I think this is excessive concern for bit-shaving. Make the on-disk
> representation be 8 bytes instead of 4, then you can store the OID
> directly and have no need for the separate identifier concept. This
> in turn eliminates one index, one syscache, and one set of lookup/cache
> routines. And you can have as many values of an enum as you darn please.

That's all true. It's a bit depressing to think that IMO 99% of users of
this will have enum values whose range would fit into 1 byte, but we'll
be using 8 to store it on disk. I had convinced myself that 4 was ok on
the basis that alignment issues in surrounding columns would pad out the
remaining bits anyway much of the time. Was I correct in that
assumption? Would e.g. an int after a char require 3 bytes of padding?

Ok, I'll run one more idea up the flagpole before giving up on a 4 byte
on disk representation. :) How about assigning a unique 4 byte id to
each enum value, and storing that on disk. This would be unique across
the database, not per enum type. The structure of pg_enum would be a bit
different, as the per-type enum id would be gone, and there would be
multiple rows for each enum type. The columns would be: the type oid,
the associated unique id and the textual representation. That would
probably simplify the caching mechanism as well, since input function
lookups could do a straight syscache lookup on type oid and text
representation, and the output function could do a straight lookup on
the unique id. No need to muck around creating a little dynahash or
whatever to attach to the fn_entra pointer.

It does still require the extra syscache, but it removes the limitations
on number of enum types and number of values per type while keeping the
on disk size smallish. I like that better than the original idea, actually.

> If you didn't notice already: typcache is the place to put any
> type-related caching you need to add.

I hadn't. I'll investigate. Thanks.

Cheers

Tom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-08-16 17:33:18 Re: Enum proposal / design
Previous Message Tom Lane 2006-08-16 17:14:57 Re: libpq Describe Extension [WAS: Bytea and perl]