Re: A space-efficient, user-friendly way to store categorical data

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Andrew Kane <andrew(at)chartkick(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: A space-efficient, user-friendly way to store categorical data
Date: 2018-02-13 03:01:33
Message-ID: A838A06F-5481-49BF-A4F1-23A17F7779DB@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Feb 12, 2018, at 5:08 PM, Andrew Kane <andrew(at)chartkick(dot)com> wrote:
>
> Thanks everyone for the feedback. The current enum implementation requires you to create a new type and add labels outside a transaction prior to an insert.
>
> -- on table creation
> CREATE TYPE city AS ENUM ();
> CREATE TABLE "users" ("city" city);
>
> -- on insert
> ALTER TYPE city ADD VALUE IF NOT EXISTS 'Chicago';
> BEGIN;
> INSERT INTO "users" ("city") VALUES ('Chicago');
> COMMIT;
>
> What would be ideal:
>
> -- on table creation
> CREATE TABLE "users" ("city" dynamic_enum);
>
> -- on insert
> BEGIN;
> INSERT INTO "users" ("city") VALUES ('Chicago');
> COMMIT;
>
> Since enums have a fixed number of labels, this type of feature may be better off as a property you could add to text columns (as Thomas mentions). This would avoid issues with hitting the max number of labels.

In your proposed feature, what happens if I create two tables:

CREATE TABLE myusers (city dynamic_enum);
CREATE TABLE yourusers (city dynamic_enum);

Do you imagine that myusers and yourusers are referring to the
same enum or to two different enums? Are the enums stored in
a new table within pg_catalog, or are they stored in something akin
to a toast table? If you insert billions of rows into a table, but only
have 30 distinct values, can you quickly query for all 30 distinct enum
values, or would you have to walk billions of rows to find them all?

mark

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2018-02-13 03:11:25 Re: A space-efficient, user-friendly way to store categorical data
Previous Message Masahiko Sawada 2018-02-13 02:58:35 Typo in origin.c