Skip site navigation (1) Skip section navigation (2)

Re: qsort again (was Re: [PERFORM] Strange Create

From: Ron <rjpeace(at)earthlink(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-performance(at)postgresql(dot)org,pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Date: 2006-02-17 16:44:51
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-performance
At 10:53 AM 2/17/2006, Martijn van Oosterhout wrote:
>On Fri, Feb 17, 2006 at 08:23:40AM -0500, Ron wrote:
> > >For this mapping, you need a full table sort.
> > One physical IO pass should be all that's needed.  However, let's
> > pretend you are correct and that we do need to sort the table to get
> > the key mapping.  Even so, we would only need to do it =once= and
> > then we would be able to use and incrementally update the results
> > forever afterward.  Even under this assumption, one external sort to
> > save all subsequent such sorts seems well worth it.
> >
> > IOW, even if I'm wrong about the initial cost to do this; it is still
> > worth doing ;-)
>I think you're talking about something different here. You're thinking
>of having the whole table sorted and when you add a new value you add
>it in such a way to keep it sorted. The problem is, what do you sort it
>by? If you've sorted the table by col1, then when the user does ORDER
>BY col2 it's useless.
No, I'm thinking about how to represent DB row data in such a way that
a= we use a compact enough representation that we can sort internally 
rather than externally.
b= we do the sort once and avoid most of the overhead upon subsequent 
similar requests.

I used the example of sorting on the entire row to show that the 
approach works even when the original record being sorted by is very large.
All my previous comments on this topic hold for the case where we are 
sorting on only part of a row as well.

If all you are doing is sorting on a column or a few columns, what 
I'm discussing is even easier since treating the columns actually 
being used a sort criteria as integers rather than the whole row as 
an atomic unit eats less resources during the key creation and 
mapping process.  If the row is 2-4KB in size, but we only care about 
some combination of columns that only takes on <= 2^8 or <= 2^16 
different values, then what I've described will be even better than 
the original example I gave.

Basically, the range of a key is going to be restricted by how
a= big the field is that represents the key (columns and such are 
usually kept narrow for performance reasons) or
b= big each row is (the more space each row takes, the fewer rows fit 
into any given amount of storage)
c= many rows there are in the table
Between the conditions, the range of a key tends to be severely 
restricted and therefore use much less space than sorting the actual 
DB records would.  ...and that gives us something we can take advantage of.

>Indeed, this is what btrees do, you store the order of the table
>seperate from the data. And you can store multiple orders. But even
>then, when someone does ORDER BY lower(col1), it's still useless.
>And you're right, we still need to do the single mass sort in the
>beginning, which is precisely what we're trying to optimise here.
Sigh.  My points were:
1= we have information available to us that allows us to map the rows 
in such a way as to turn most external sorts into internal sorts, 
thereby avoiding the entire external sorting problem in those 
cases.  This is a huge performance improvement.

2= that an external sort is =NOT= required for initial key 
assignment, but even if it was it would be worth it.

3= that this is a variation of a well known technique so I'm not 
suggesting heresy here.


In response to

pgsql-performance by date

Next:From: RonDate: 2006-02-17 16:51:26
Subject: Need pointers to "standard" pg database(s) for testing
Previous:From: Martijn van OosterhoutDate: 2006-02-17 16:31:23
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index

pgsql-hackers by date

Next:From: Joshua D. DrakeDate: 2006-02-17 16:44:57
Subject: Re: Updated email signature
Previous:From: Martijn van OosterhoutDate: 2006-02-17 16:31:23
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group