Problem with ORDER BY and DISTINCT ON

From: Steve Midgley <public(at)misuse(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Subject: Problem with ORDER BY and DISTINCT ON
Date: 2008-07-16 07:39:47
Message-ID: 20080716073955.EB1DF64FCBD@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi,

I'm a little baffled. I'm trying to generate a SQL statement that
issues a DISTINCT ON using the same values as my ORDER BY statement.
I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm
on Pg 8.2. Here is some SQL to get you started at seeing my problem:

------------------

drop table if exists property;
create table property
( id serial,
state varchar(255),
search_rate_max decimal(8,2),
data_priority_code varchar(255)
);

SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id
LIMIT 10 OFFSET 0

----------------
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
SQL state: 42P10
----------------

Now if you run this statement it works

------------------

SELECT DISTINCT ON
("property"."state",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
property.id
LIMIT 10 OFFSET 0

------------------

However if you run this statement it ALSO works, which tells me it's
not just my CASE statements that are messing things up (note in this
example, I just removed the primary key "property.id" from the ORDER BY
and DISTINCT ON clauses:

-------------------

SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max"
)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max"
LIMIT 10 OFFSET 0

--------------------
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
SQL state: 42P10
--------------------

Finally, if you run this statement it works fine (removing one of the
duplicate search_rate_max statements):

--------------------

SELECT DISTINCT ON
("property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id)
property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY
"property"."state",
CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2
END,"search_rate_max",
property.id LIMIT 10 OFFSET 0

--------------------

What's going on here? Am I doing something that isn't legitimate SQL? I
can't see why having a duplicate CASE statement should foul things up
like this? It's pretty clear (from additional testing not included in
this email) that the duplicate "search_rate_max" CASE is causing the
problem.

Thanks for any advice or suggestions on how to get this to run
correctly. Is this a bug?

Basically I'm doing this as an optimization - I can get much better
performance running the DISTINCT ON in some circumstances than using
DISTINCT, but the edge case above is breaking my tests and preventing
me from implementing the idea. The code is generated by an application
layer which is not really paying attention to whether or not the two
CASE statements apply to the same field or not (sometimes they do
sometimes they don't)..

Thanks!

Steve

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Ivan Sergio Borgonovo 2008-07-16 09:06:11 integrity check and visibility was: COPY equivalent for updates
Previous Message A. Kretschmer 2008-07-16 05:39:11 Re: How to GROUP results BY month