Re: qsort again (was Re: [PERFORM] Strange Create Index

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Scott Lamb <slamb(at)slamb(dot)org>
Cc: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>, Markus Schaber <schabi(at)logix-tt(dot)com>, pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index
Date: 2006-02-17 16:31:23
Message-ID: 20060217163123.GE9254@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Feb 17, 2006 at 08:18:41AM -0800, Scott Lamb wrote:
> On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:
> >Data types which could probably provide a useful function for f
> >would be
> >int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
>
> ...and with some work, floats (I think just the exponent would work,
> if nothing else). bytea. Probably just about anything.
>
> Interesting. If you abandon the idea that collisions should be
> impossible (they're not indexes) or extremely rare (they're not
> hashes), it's pretty easy to come up with a decent hint to avoid a
> lot of dereferences.

Yep, pretty much for any datatype you create a mapping function to map
it to a signed int32. All you have to guarentee is that f(a) > f(b)
implies that a > b. Only if f(a) == f(b) do you need to compare a and
b.

You then change the sorting code to have an array of (Datum,int32)
(ouch, double the storage) where the int32 is the f(Datum). And in the
comparison routines you first check the int32. If they give an order
you're done. On match you do the full comparison.

For integer types (int2,int4,int8,oid) the conversion is straight
forward. For float you'd use the exponent and the first few bits of the
mantissa. For strings you'd have to bail, or use a strxfrm equivalent.
NULL would be INT_MAX pr INT_MIN depending on where you want it. Thing
is, even if you don't have such a function and always return zero, the
results will still be right.

Not a new idea, but it would be very nice to implement. If would
produce nice speedups for type where comparisons are expensive. And
more importantly, the bulk of the comparisons can be moved inline and
make the whole cache-friendlyness discussed here much more meaningful.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ron 2006-02-17 16:44:51 Re: qsort again (was Re: [PERFORM] Strange Create
Previous Message Scott Lamb 2006-02-17 16:18:41 Re: qsort again (was Re: [PERFORM] Strange Create Index

Browse pgsql-performance by date

  From Date Subject
Next Message Ron 2006-02-17 16:44:51 Re: qsort again (was Re: [PERFORM] Strange Create
Previous Message Scott Lamb 2006-02-17 16:18:41 Re: qsort again (was Re: [PERFORM] Strange Create Index