Re: qsort again (was Re: [PERFORM] Strange Create

From: Ron <rjpeace(at)earthlink(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-performance(at)postgresql(dot)org,pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Date: 2006-02-17 16:44:51
Message-ID: 7.0.1.0.2.20060217111116.03a509e8@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

At 10:53 AM 2/17/2006, Martijn van Oosterhout wrote:
>On Fri, Feb 17, 2006 at 08:23:40AM -0500, Ron wrote:
> > >For this mapping, you need a full table sort.
> > One physical IO pass should be all that's needed. However, let's
> > pretend you are correct and that we do need to sort the table to get
> > the key mapping. Even so, we would only need to do it =once= and
> > then we would be able to use and incrementally update the results
> > forever afterward. Even under this assumption, one external sort to
> > save all subsequent such sorts seems well worth it.
> >
> > IOW, even if I'm wrong about the initial cost to do this; it is still
> > worth doing ;-)
>
>I think you're talking about something different here. You're thinking
>of having the whole table sorted and when you add a new value you add
>it in such a way to keep it sorted. The problem is, what do you sort it
>by? If you've sorted the table by col1, then when the user does ORDER
>BY col2 it's useless.
No, I'm thinking about how to represent DB row data in such a way that
a= we use a compact enough representation that we can sort internally
rather than externally.
b= we do the sort once and avoid most of the overhead upon subsequent
similar requests.

I used the example of sorting on the entire row to show that the
approach works even when the original record being sorted by is very large.
All my previous comments on this topic hold for the case where we are
sorting on only part of a row as well.

If all you are doing is sorting on a column or a few columns, what
I'm discussing is even easier since treating the columns actually
being used a sort criteria as integers rather than the whole row as
an atomic unit eats less resources during the key creation and
mapping process. If the row is 2-4KB in size, but we only care about
some combination of columns that only takes on <= 2^8 or <= 2^16
different values, then what I've described will be even better than
the original example I gave.

Basically, the range of a key is going to be restricted by how
a= big the field is that represents the key (columns and such are
usually kept narrow for performance reasons) or
b= big each row is (the more space each row takes, the fewer rows fit
into any given amount of storage)
c= many rows there are in the table
Between the conditions, the range of a key tends to be severely
restricted and therefore use much less space than sorting the actual
DB records would. ...and that gives us something we can take advantage of.

>Indeed, this is what btrees do, you store the order of the table
>seperate from the data. And you can store multiple orders. But even
>then, when someone does ORDER BY lower(col1), it's still useless.
>
>And you're right, we still need to do the single mass sort in the
>beginning, which is precisely what we're trying to optimise here.
Sigh. My points were:
1= we have information available to us that allows us to map the rows
in such a way as to turn most external sorts into internal sorts,
thereby avoiding the entire external sorting problem in those
cases. This is a huge performance improvement.

2= that an external sort is =NOT= required for initial key
assignment, but even if it was it would be worth it.

3= that this is a variation of a well known technique so I'm not
suggesting heresy here.

Ron

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2006-02-17 16:44:57 Re: Updated email signature
Previous Message Martijn van Oosterhout 2006-02-17 16:31:23 Re: qsort again (was Re: [PERFORM] Strange Create Index

Browse pgsql-performance by date

  From Date Subject
Next Message Ron 2006-02-17 16:51:26 Need pointers to "standard" pg database(s) for testing
Previous Message Martijn van Oosterhout 2006-02-17 16:31:23 Re: qsort again (was Re: [PERFORM] Strange Create Index