Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] qsort again (was Re: Strange Create

From: PFC <lists(at)peufeu(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] qsort again (was Re: Strange Create
Date: 2006-02-17 10:55:34
Message-ID: op.s435ywc1cigqcu@apollo13 (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
	Has anybody got some profiler data on the amount of time spent in  
comparisons during a sort ? Say, the proposals here would give the most  
gains on simple types like INTEGER ; so it would be interesting to know  
how much time is now spent in comparisons for sorting a column of ints. If  
it's like, 10% of the total time, well...

	More hand-waving :

	What are the usage case for sorts ?

	- potentially huge data sets : create index, big joins, reporting queries  
etc.
	- small data sets : typically, a query with an ORDER BY which will return  
a small amount of rows (website stuff), or joins not small enough to use a  
HashAggregate, but not big enough to create an index just for them.

	The cost of a comparison vs. moving stuff around and fetching stuff is  
probably very different in these two cases. If it all neatly fits in  
sort_mem, you can do fancy stuff (like sorting on SortKeys) which will  
need to access the data in almost random order when time comes to hand the  
sorted data back. So, I guess the SortKey idea would rather apply to the  
latter case only, which is CPU limited.

	Anyway, I was wondering about queries with multipart keys, like ORDER BY  
zipcode, client_name, date and the like. Using just an int64 as the key  
isn't going to help a lot here. Why not use a binary string of limited  
length ? I'd tend to think it would not be that slower than comparing  
ints, and it would be faster than calling each comparison function for  
each column. Each key part would get assigned to a byte range in the  
string.
	It would waste some memory, but for instance, using 2x the memory for  
half the time would be a good tradeoff if the amount of memory involved is  
in the order of megabytes.
	Also, it would solve the problem of locales. Comparisons involving  
locales are slow, but strings can be converted to a canonical form  
(removing accents and stuff), and then binary sorted.

	Also I'll insert a plug for the idea that the Sort needs to know if there  
will be a LIMIT afterwards ; this way it could reduce its working set by  
simply discarding the rows which would have been discarded anyway by the  
LIMIT. Say you want the 100 first rows out of a million ordered rows. If  
the sort knows this, it can be performed in the amount of memory for a 100  
rows sort.


In response to

pgsql-performance by date

Next:From: RonDate: 2006-02-17 13:01:34
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Previous:From: Markus SchaberDate: 2006-02-17 10:19:45
Subject: Re: qsort again (was Re: [PERFORM] Strange Create

pgsql-hackers by date

Next:From: RonDate: 2006-02-17 13:01:34
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Previous:From: Markus SchaberDate: 2006-02-17 10:19:45
Subject: Re: qsort again (was Re: [PERFORM] Strange Create

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group