Re: Evaluation of secondary sort key.

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, David Fetter <david(at)fetter(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Evaluation of secondary sort key.
Date: 2011-04-18 18:47:03
Message-ID: BANLkTinB8u12kSuhdGk7+T8zRt2w8xW6vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 18, 2011 at 5:38 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
>> order by case when (complex expresssion) 1 when (complex expression) 2
>> else 3
>
> How come that expression be relevant? There is only one sortkey and no
> limit, so no matter what it should clearly get the full resultset in all
> cases.

Sure, imagine there are more order by clauses with this one as the last one.

> Yes, as with all other cases it would be hard to get the optimum, but
> there is also cases where it is straightforward, say when the secondary
> sort column has an ndistinct of -1 (or similar close to). The current
> standard
> assumption is that 2 columns are unrelated, that would also work here. (As
> good as is
> does similar places in PG).

I'm not following what you mean with the secondary column having
ndistinct of -1. Actually it seems to me a reasonable low-hanging
fruit to reach for would be when the initial column has an ndistinct
of -1 which is actually kind of common.

A lot of SQL queries end up being written with GROUP BY primary_key,
other_column, other_column, other_column just to get those other
columns to be queryable. If we implemented the SQL standard
"dependent" columns feature this would be unnecessary but we don't and
even if we did people would still build schemas and queries that
defeat the optimization.

In these cases we probably do have ndistinct -1 for one of the columns
and therefore that an index on that column alone would give us almost
the right ordering and quite likely exactly the right ordering. If we
buffered the outputs for any distinct value and output sorted them if
there were multiple rows. It would probably somewhat worse if we guess
wrong though.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-18 18:59:32 Re: Evaluation of secondary sort key.
Previous Message Robert Haas 2011-04-18 17:57:20 Re: Formatting Curmudgeons WAS: MMAP Buffers