Quick Links

Re: Evaluation of secondary sort key.

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Martijn van Oosterhout <kleptog(at)svana(dot)org>, David Fetter <david(at)fetter(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Evaluation of secondary sort key.
Date:	2011-04-18 05:25:47
Message-ID:	4DABCB5B.4080606@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2011-04-09 18:54, Tom Lane wrote:
> I think that would be a positive disimprovement. The current design
> guarantees that volatile sort expressions are evaluated exactly once,
> in the order the rows are read from the data source. There would be no
> guarantees at all, either as to the number of evaluations or the order
> in which they happen, if we tried to do evaluation only during the
> actual sort.
If I could "only evaluate" the rows needed, then that would also
be a huge win, below case shifts what "definitely shouldn't be a seqscan"
into one due to a secondary sort key.

> Another small problem is that any such thing would require carrying
> along some kind of closure (ie, the expression and all its input
> values), not just the final sort key value, in tuples being sorted.
> The ensuing complexity, speed penalty, and increase in data volume
> to be sorted would be paid by everybody, making this probably a net
> performance loss when considered across all applications.
Getting the value for the first sortkey and carrying on a closure
for the rest would mostly (very often) be "optimal" ?

It would also enable a select that has to sortkeys to utilize an
index that only contains the primary sortkey, which is a huge
negative effect of what's being done today.

2011-04-18 07:12:43.931 testdb=# explain select id from testdb.testtable
order by id limit 500;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.00..262.75 rows=500 width=4)
-> Index Scan using testtable_pkey on testtable
(cost=0.00..15015567.84 rows=28573400 width=4)
(2 rows)

Time: 1.363 ms
2011-04-18 07:12:52.498 testdb=# explain select id from testdb.testtable
order by id,name limit 500;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=5165456.70..5165457.95 rows=500 width=35)
-> Sort (cost=5165456.70..5236890.20 rows=28573400 width=35)
Sort Key: id, name
-> Seq Scan on testtable (cost=0.00..3741675.00
rows=28573400 width=35)
(4 rows)

Time: 1.420 ms

Enabling any users to sort using multiple keys, without ending in Seq
Scans somewhere down
the line seems to require multi dimension indexes on all combinations of
colums

--
Jesper

In response to

Re: Evaluation of secondary sort key. at 2011-04-09 16:54:09 from Tom Lane

Responses

Re: Evaluation of secondary sort key. at 2011-04-18 09:00:57 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2011-04-18 05:43:50	Re: Formatting Curmudgeons WAS: MMAP Buffers
Previous Message	Greg Smith	2011-04-18 04:48:06	Re: Formatting Curmudgeons WAS: MMAP Buffers