Just to add a little anarchy in your nice debate...
Who really needs all the results of a sort on your terabyte table ?
I guess not many people do a SELECT from such a table and want all the
results. So, this leaves :
- Really wanting all the results, to fetch using a cursor,
- CLUSTER type things, where you really want everything in order,
- Aggregates (Sort->GroupAggregate), which might really need to sort the
- Complex queries where the whole dataset needs to be examined, in order
to return a few values
- Joins (again, the whole table is probably not going to be selected)
- And the ones I forgot.
Most likely you only want to SELECT N rows, in some ordering :
- the first N (ORDER BY x LIMIT N)
- last N (ORDER BY x DESC LIMIT N)
- WHERE x>value ORDER BY x LIMIT N
- WHERE x<value ORDER BY x DESC LIMIT N
- and other variants
Or, you are doing a Merge JOIN against some other table ; in that case,
yes, you might need the whole sorted terabyte table, but most likely there
are WHERE clauses in the query that restrict the set, and thus, maybe we
can get some conditions or limit values on the column to sort.
Also the new, optimized hash join, which is more memory efficient, might
cover this case.
Point is, sometimes, you only need part of the results of your sort. And
the bigger the sort, the most likely it becomes that you only want part of
the results. So, while we're in the fun hand-waving, new algorithm trying
mode, why not consider this right from the start ? (I know I'm totally in
hand-waving mode right now, so slap me if needed).
I'd say your new, fancy sort algorithm needs a few more input values :
- Range of values that must appear in the final result of the sort :
none, minimum, maximum, both, or even a set of values from the other
side of the join, hashed, or sorted.
- LIMIT information (first N, last N, none)
- Enhanced Limit information (first/last N values of the second column to
sort, for each value of the first column) (the infamous "top10 by
With this, the amount of data that needs to be kept in memory is
dramatically reduced, from the whole table (even using your compressed
keys, that's big) to something more manageable which will be closer to the
size of the final result set which will be returned to the client, and
avoid a lot of effort.
So, this would not be useful in all cases, but when it applies, it would
be really useful.
In response to
pgsql-performance by date
|Next:||From: PFC||Date: 2005-09-29 16:12:52|
|Subject: Re: Comparative performance|
|Previous:||From: Michael Ben-Nes||Date: 2005-09-29 14:25:49|
|Subject: Re: Advice on RAID card|
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2005-09-29 16:22:59|
|Subject: Re: pgbench: undefined reference to strndup() |
|Previous:||From: Tino Wildenhain||Date: 2005-09-29 15:33:28|
|Subject: Re: postgresql clustering|