Skip site navigation (1) Skip section navigation (2)

Re: qsort again (was Re: [PERFORM] Strange Create Index

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Scott Lamb <slamb(at)slamb(dot)org>
Cc: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>,Markus Schaber <schabi(at)logix-tt(dot)com>,pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index
Date: 2006-02-17 16:31:23
Message-ID: 20060217163123.GE9254@svana.org (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
On Fri, Feb 17, 2006 at 08:18:41AM -0800, Scott Lamb wrote:
> On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:
> >Data types which could probably provide a useful function for f  
> >would be
> >int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
> 
> ...and with some work, floats (I think just the exponent would work,  
> if nothing else). bytea. Probably just about anything.
> 
> Interesting. If you abandon the idea that collisions should be  
> impossible (they're not indexes) or extremely rare (they're not  
> hashes), it's pretty easy to come up with a decent hint to avoid a  
> lot of dereferences.

Yep, pretty much for any datatype you create a mapping function to map
it to a signed int32. All you have to guarentee is that f(a) > f(b)
implies that a > b. Only if f(a) == f(b) do you need to compare a and
b.

You then change the sorting code to have an array of (Datum,int32)
(ouch, double the storage) where the int32 is the f(Datum). And in the
comparison routines you first check the int32. If they give an order
you're done. On match you do the full comparison.

For integer types (int2,int4,int8,oid) the conversion is straight
forward. For float you'd use the exponent and the first few bits of the
mantissa. For strings you'd have to bail, or use a strxfrm equivalent.
NULL would be INT_MAX pr INT_MIN depending on where you want it. Thing
is, even if you don't have such a function and always return zero, the
results will still be right.

Not a new idea, but it would be very nice to implement. If would
produce nice speedups for type where comparisons are expensive. And
more importantly, the bulk of the comparisons can be moved inline and
make the whole cache-friendlyness discussed here much more meaningful.

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog(at)svana(dot)org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

pgsql-performance by date

Next:From: RonDate: 2006-02-17 16:44:51
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Previous:From: Scott LambDate: 2006-02-17 16:18:41
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index

pgsql-hackers by date

Next:From: RonDate: 2006-02-17 16:44:51
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Previous:From: Scott LambDate: 2006-02-17 16:18:41
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group