Re: Evaluation of secondary sort key.

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, David Fetter <david(at)fetter(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Evaluation of secondary sort key.
Date: 2011-04-18 05:25:47
Message-ID: 4DABCB5B.4080606@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-04-09 18:54, Tom Lane wrote:
> I think that would be a positive disimprovement. The current design
> guarantees that volatile sort expressions are evaluated exactly once,
> in the order the rows are read from the data source. There would be no
> guarantees at all, either as to the number of evaluations or the order
> in which they happen, if we tried to do evaluation only during the
> actual sort.
If I could "only evaluate" the rows needed, then that would also
be a huge win, below case shifts what "definitely shouldn't be a seqscan"
into one due to a secondary sort key.

> Another small problem is that any such thing would require carrying
> along some kind of closure (ie, the expression and all its input
> values), not just the final sort key value, in tuples being sorted.
> The ensuing complexity, speed penalty, and increase in data volume
> to be sorted would be paid by everybody, making this probably a net
> performance loss when considered across all applications.
Getting the value for the first sortkey and carrying on a closure
for the rest would mostly (very often) be "optimal" ?

It would also enable a select that has to sortkeys to utilize an
index that only contains the primary sortkey, which is a huge
negative effect of what's being done today.

2011-04-18 07:12:43.931 testdb=# explain select id from testdb.testtable
order by id limit 500;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.00..262.75 rows=500 width=4)
-> Index Scan using testtable_pkey on testtable
(cost=0.00..15015567.84 rows=28573400 width=4)
(2 rows)

Time: 1.363 ms
2011-04-18 07:12:52.498 testdb=# explain select id from testdb.testtable
order by id,name limit 500;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=5165456.70..5165457.95 rows=500 width=35)
-> Sort (cost=5165456.70..5236890.20 rows=28573400 width=35)
Sort Key: id, name
-> Seq Scan on testtable (cost=0.00..3741675.00
rows=28573400 width=35)
(4 rows)

Time: 1.420 ms

Enabling any users to sort using multiple keys, without ending in Seq
Scans somewhere down
the line seems to require multi dimension indexes on all combinations of
colums

--
Jesper

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-18 05:43:50 Re: Formatting Curmudgeons WAS: MMAP Buffers
Previous Message Greg Smith 2011-04-18 04:48:06 Re: Formatting Curmudgeons WAS: MMAP Buffers