Re: WIP: extensible enums

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:12:10
Message-ID: 10588.1287875530@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Latest patch attached.

I've been working through this patch. It occurs to me that there's a
fairly serious problem with the current implementation of insertion of
new values within the bounds of the current sort ordering. Namely, that
it does that by reassigning the enumsortorder values of pre-existing
rows. That creates a race condition: suppose that our backend is doing
that while another process is busy loading its internal cache of values
of the enum. If we commit partway through the other process's loading
of its cache, it may end up with a cache containing some pre-commit
entries and some post-commit entries. In the worst case it might even
have two images of the same enum label, with different enumsortorder
values. Needless to say, this is catastrophic for correctness of
subsequent comparisons in the other process.

We could try to avoid the race condition, but it's not going to be easy.
I think a better idea is to avoid having to issue updates against
pg_enum rows once they are inserted. To do that, I propose that instead
of integer enumsortorder values, we use float8 values. The initial
entries for a type would still have numbers 1..n, but when we need to
insert a value between two existing entries, we assign it a value
halfway between their enumsortorder values. Then we never have to alter
pre-existing entries, and there's no race condition: at worst, a
process's cache entry might be missing some just-committed rows, and we
know how to deal with that.

The disadvantage of this scheme is that if you repeatedly insert entries
in the "same place" in the sort order, you halve the available range
each time, so you'd run out of room after order-of-fifty halvings.
The values would eventually differ by only one unit in the last place,
so it'd not be possible to insert another value that would still be
distinguishable in the sort order. Is that an acceptable restriction?
(If you did run into this, you could manually reassign enumsortorder
values to get out of it, without having to dump-and-reload; but you'd
have to beware of the same race condition as above.) Of course adding
new values at the start or end of the enum's list wouldn't have that
restriction.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-10-23 23:21:42 Re: WIP: extensible enums
Previous Message Josh Berkus 2010-10-23 23:03:36 Re: ask for review of MERGE