Re: A space-efficient, user-friendly way to store categorical data

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Kane <andrew(at)chartkick(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A space-efficient, user-friendly way to store categorical data
Date: 2018-02-11 23:24:29
Message-ID: CAA8=A7-df9JSaVqHy2bRJBfNP=NjqdfmKHMPbPcM6Cs_3x7RoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 12, 2018 at 9:10 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Kane <andrew(at)chartkick(dot)com> writes:
>> A better option could be a new "dynamic enum" type, which would have
>> similar storage requirements as an enum, but instead of labels being
>> declared ahead of time, they would be added as data is inserted.
>
> You realize, of course, that it's possible to add labels to an enum type
> today. (Removing them is another story.)
>
> You haven't explained exactly what you have in mind that is going to be
> able to duplicate the advantages of the current enum implementation
> without its disadvantages, so it's hard to evaluate this proposal.
>

This sounds rather like the idea I have been tossing around in my head
for a while, and in sporadic discussions with a few people, for a
dictionary object. The idea is to have an append-only list of labels
which would not obey transactional semantics, and would thus help us
avoid the pitfalls of enums - there wouldn't be any rollback of an
addition. The use case would be for a jsonb representation which
would replace object keys with the oid value of the corresponding
dictionary entry rather like enums now. We could have a per-table
dictionary which in most typical json use cases would be very small,
and we know from some experimental data that the compression in space
used from such a change would often be substantial.

This would have to be modifiable dynamically rather than requiring
explicit additions to the dictionary, to be of practical use for the
jsonb case, I believe.

I hadn't thought about this as a sort of super enum that was usable
directly by users, but it makes sense.

I have no idea how hard or even possible it would be to implement.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-02-11 23:42:08 Re: [HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?
Previous Message Tom Lane 2018-02-11 22:40:47 Re: A space-efficient, user-friendly way to store categorical data