Re: WIP: extensible enums

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-25 00:12:46
Message-ID: 23536.1287965566@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/24/2010 05:34 PM, Tom Lane wrote:
>> BTW, I've forgotten --- did anyone publish a test case for checking
>> performance of enum comparisons? Or should I just cons up something
>> privately?

> I have been running tests with
> <http://developer.postgresql.org/~adunstan/enumtest.dmp>

> The table "mydata" contains 100k rows with the column "label" containing
> random values of an enum type with 500 labels. Basically, my main test
> is to set up an index on that column. The alter the enum type, and test
> again. Then alter some of the rows, and test again.

OK, I did some timing consisting of building a btree index with
maintenance_work_mem set reasonably high. This is on a debug-enabled
build, so it's not representative of production performance, but it will
do for seeing what we're doing to enum comparison performance. Here's
what I tried:

Stock 9.0.1 24.9 sec

patch, all OIDs even 25.2 sec (~ 1% hit)

patch, half of OIDs odd 27.2 sec (~ 9% hit)

same, bitmapset forced null 64.9 sec (~ 160% hit)

(Note that the noise level in these measurements is about 1%;
I'm not entirely convinced that the all-even case is really measurably
slower than 9.0.)

The "half of OIDs odd" case is what you'd get for a binary upgrade
from a 9.0 database. The last case shows what happens if the
intermediate bitmapset-test optimization is disabled, forcing all
comparisons to do binary searches in the sorted-by-OID array
(except for the one-quarter of cases where both OIDs are even
by chance). It's pretty grim but it represents a worst case that
you'd be very unlikely to hit in practice.

This shows that the bitmapset optimization really is quite effective,
at least for cases where all the enum labels are sorted by OID after
all. That motivated me to change the bitmapset setup code to what's
attached. This is potentially a little slower at initializing the
cache, but it makes up for that by still marking most enum members
as sorted even when a few out-of-order members have been inserted.
The key point is that an out-of-order member in the middle of the
array doesn't prevent us from considering following members as
properly sorted, as long as they are correctly ordered with respect to
the other properly-sorted members.

With this approach we can honestly say that inserting an out-of-order
enum value doesn't impact comparison performance for pre-existing
enum members, only for comparisons involving the out-of-order value
itself; even when the existing members were binary-upgraded and thus
weren't all even. I think that's a worthwhile advantage.

IMHO this level of performance is good enough. Anyone unhappy?

regards, tom lane

/*
* Here, we create a bitmap listing a subset of the enum's OIDs that are
* known to be in order and can thus be compared with just OID comparison.
*
* The point of this is that the enum's initial OIDs were certainly in
* order, so there is some subset that can be compared via OID comparison;
* and we'd rather not do binary searches unnecessarily.
*
* This is somewhat heuristic, and might identify a subset of OIDs that
* isn't exactly what the type started with. That's okay as long as
* the subset is correctly sorted.
*/
bitmap_base = InvalidOid;
bitmap = NULL;
bm_size = 1; /* only save sets of at least 2 OIDs */

for (start_pos = 0; start_pos < numitems - 1; start_pos++)
{
/*
* Identify longest sorted subsequence starting at start_pos
*/
Bitmapset *this_bitmap = bms_make_singleton(0);
int this_bm_size = 1;
Oid start_oid = items[start_pos].enum_oid;
float4 prev_order = items[start_pos].sort_order;
int i;

for (i = start_pos + 1; i < numitems; i++)
{
Oid offset;

offset = items[i].enum_oid - start_oid;
/* quit if bitmap would be too large; cutoff is arbitrary */
if (offset >= 1024)
break;
/* include the item if it's in-order */
if (items[i].sort_order > prev_order)
{
prev_order = items[i].sort_order;
this_bitmap = bms_add_member(this_bitmap, (int) offset);
this_bm_size++;
}
}

/* Remember it if larger than previous best */
if (this_bm_size > bm_size)
{
bms_free(bitmap);
bitmap_base = start_oid;
bitmap = this_bitmap;
bm_size = this_bm_size;
}
else
bms_free(this_bitmap);

/*
* Done if it's not possible to find a longer sequence in the rest
* of the list. In typical cases this will happen on the first
* iteration, which is why we create the bitmaps on the fly instead
* of doing a second pass over the list.
*/
if (bm_size >= (numitems - start_pos - 1))
break;
}

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-10-25 00:24:54 Re: why does plperl cache functions using just a bool for is_trigger
Previous Message Tom Lane 2010-10-24 23:50:03 Re: why does plperl cache functions using just a bool for is_trigger