Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

From: Victor Nordam Suadicani <v(dot)n(dot)suadicani(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?
Date: 2023-05-18 12:27:27
Message-ID: CAHbE9O3RAR+J2bvA47rLbkTmUpV_U7ctr-rHjDPA+77GCs-=EQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Is there any nice way to handle sum types (aka tagged unions) in a
PostgreSQL database? I've searched far and wide and have not reached any
satisfying answer.

As a (somewhat contrived) example, say I have the following enum in Rust:

enum TaggedUnion {
Variant1(String),
Variant2(i32),
Variant3(f64),
}

How might I best save this data in a PostgreSQL database? There is to my
knowledge unfortunately no way to "natively" handle sum types like this.

One method would be to have 3 different tables, one for each variant. This
is not a great solution as you can't (as far as I know) easily query for
all variants at once (for instance, to serialize into a Vec<TaggedUnion> on
the Rust side).

Another method would be to use PostgreSQL table inheritance. This has the
same problem as the above workaround but also has the issue that you could
put something into the supertable without having a corresponding entry in
any of the subtables (basically a value not inhabiting any of the variants,
which is nonsense).

A third method would be to save all fields of all variants into a single
table, with all fields being nullable. So you'd have a nullable text field,
nullable integer and nullable double precision field. You'd then need an
additional tag field to indicate which variant of the union is used and
you'd have to write check constraints for each variant to ensure that all
the fields in that variant are not null and all the fields not in that
variant *are* null. This *almost* works, but has two major problems:

1. It wastes space. In Rust, an enum is only as big as its largest variant.
Using this method, a table row would be as big as the sum of all the
variants.

2. Querying the data is very cumbersome, as there is no way to indicate
(beyond check constraints) that, given a specific tag, certain other fields
must be filled while certain other fields must not be. For instance, the
nullability of fields can be used to serialize into the Option type in
Rust. There is no "nice" way to tell the host language that the nullability
of the variant fields is hinged on the value of the extra tag field.

Both of these problems get bigger and bigger as you add more variants - it
doesn't scale well.

Does anyone know of better methods? I realize the example enum is contrived
but this kind of thing comes up *a lot* in all kinds of places. For
instance different kinds of messages that all have various different
associated data. Or different events that record different additional data.
Sum types are ubiquitous in languages like Rust and Haskell.

If there are no good methods of handling this, is there any way PostgreSQL
could be extended with capabilities for this? I have no idea how this would
be done in practice though. Perhaps SQL itself is just unsuited for data of
this kind? I don't really see why it should be though.

Thanks,
Victor Nordam Suadicani

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brian Mendoza 2023-05-18 13:17:04 JSONB operator unanticipated behaviour
Previous Message Andreas Kretschmer 2023-05-18 09:31:08 Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?