Re: qsort again (was Re: [PERFORM] Strange Create

From: Ron <rjpeace(at)earthlink(dot)net>
To: Markus Schaber <schabi(at)logix-tt(dot)com>,pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Date: 2006-02-17 13:23:40
Message-ID: 7.0.1.0.2.20060217080226.03a11010@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

At 05:19 AM 2/17/2006, Markus Schaber wrote:
>Hi, Ron,
>
>Ron schrieb:
>
> > OK, so here's _a_ way (there are others) to obtain a mapping such that
> > if a < b then f(a) < f (b) and
> > if a == b then f(a) == f(b)
> >
> > Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
> > integer; a 4KB row becomes a 32Kb integer; etc)
> > Since even a 1TB table made of such rows can only have 256M - 512M
> > possible values even if each row is unique, a 28b or 29b key is large
> > enough to represent each row's value and relative rank compared to all
> > of the others even if all row values are unique.
> >
> > By scanning the table once, we can map say 0000001h (Hex used to ease
> > typing) to the row with the minimum value and 1111111h to the row with
> > the maximum value as well as mapping everything in between to their
> > appropriate keys. That same scan can be used to assign a pointer to
> > each record's location.
>
>But with a single linear scan, this cannot be accomplished, as the table
>contents are neither sorted nor distributed linearly between the minimum
>and the maximum.
So what? We are talking about key assignment here, not anything that
requires physically manipulating the actual DB rows.
One physical IO pass should be all that's needed.

>For this mapping, you need a full table sort.
One physical IO pass should be all that's needed. However, let's
pretend you are correct and that we do need to sort the table to get
the key mapping. Even so, we would only need to do it =once= and
then we would be able to use and incrementally update the results
forever afterward. Even under this assumption, one external sort to
save all subsequent such sorts seems well worth it.

IOW, even if I'm wrong about the initial cost to do this; it is still
worth doing ;-)

> > That initial scan to set up the keys is expensive, but if we wish that
> > cost can be amortized over the life of the table so we don't have to pay
> > it all at once. In addition, once we have created those keys, then can
> > be saved for later searches and sorts.
>
>But for every update or insert, you have to resort the keys, which is
>_very_ expensive as it basically needs to update a huge part of the table.

??? You do not need to resort already ordered data to insert a new
element into it such that the data stays ordered! Once we have done
the key ordering operation once, we should not ever need to do it
again on the original data. Else most sorting algorithms wouldn't work ;-)

Ron

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-02-17 15:40:40 Updated email signature
Previous Message Ron 2006-02-17 13:01:34 Re: qsort again (was Re: [PERFORM] Strange Create

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-02-17 14:49:05 Re: split partitioned table across several postgres servers
Previous Message Ron 2006-02-17 13:01:34 Re: qsort again (was Re: [PERFORM] Strange Create