Re: WIP: extensible enums

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 19:28:03
Message-ID: 15699.1287948483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/24/2010 12:20 PM, Tom Lane wrote:
>> With float4 the implementation would fail at somewhere
>> around 2^24 elements in an enum (since even with renumbering, there
>> wouldn't be enough bits to give each element a distinguishable value).
>> I don't see that as a real objection, and anyway if you were trying
>> to have an enum with many elements, you'd want the in-memory
>> representation to be compact.

> Anything beyond the square root of this is getting pretty insane,
> IMNSHO, so I'm really not that bothered by that number.

Here's a WIP patch that incorporates most of what's been discussed here.
The critical part of it is summed up in the comments for RenumberEnumType:

/*
* RenumberEnumType
* Renumber existing enum elements to have sort positions 1..n.
*
* We avoid doing this unless absolutely necessary; in most installations
* it will never happen. The reason is that updating existing pg_enum
* entries creates hazards for other backends that are concurrently reading
* pg_enum with SnapshotNow semantics. A concurrent SnapshotNow scan could
* see both old and new versions of an updated row as valid, or neither of
* them, if the commit happens between scanning the two versions. It's
* also quite likely for a concurrent scan to see an inconsistent set of
* rows (some members updated, some not).
*
* We can avoid these risks by reading pg_enum with an MVCC snapshot
* instead of SnapshotNow, but that forecloses use of the syscaches.
* We therefore make the following choices:
*
* 1. Any code that is interested in the enumsortorder values MUST read
* pg_enum with an MVCC snapshot, or else acquire lock on the enum type
* to prevent concurrent execution of AddEnumLabel(). The risk of
* seeing inconsistent values of enumsortorder is too high otherwise.
*
* 2. Code that is not examining enumsortorder can use a syscache
* (for example, enum_in and enum_out do so). The worst that can happen
* is a transient failure to find any valid value of the row. This is
* judged acceptable in view of the infrequency of use of RenumberEnumType.
*/

This patch isn't committable as-is because I haven't made enum_first,
enum_last, enum_range follow these coding rules: they need to stop
using the syscache and instead use an indexscan on
pg_enum_typid_sortorder_index to locate the relevant rows. That should
be just a small fix though, and it seems likely to be a net win for
performance anyway. There are a couple of other loose ends
too, in particular I still think we need to prevent ALTER TYPE ADD
inside a transaction block because of the risk of finding undefined
enum OIDs in indexes.

Anybody really unhappy with this approach? If not, I'll finish this
up and commit it.

regards, tom lane

Attachment Content-Type Size
venum10.patch text/x-patch 74.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-24 19:33:42 Re: WIP: extensible enums
Previous Message Robert Haas 2010-10-24 17:43:35 Re: ask for review of MERGE