Re: [HACKERS] distinct. Is this the correct behaviour?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vince Vielhaber <vev(at)michvhf(dot)com>
Cc: hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] distinct. Is this the correct behaviour?
Date: 1999-10-21 01:10:04
Message-ID: 15107.940468204@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Vince Vielhaber <vev(at)michvhf(dot)com> writes:
> Is this the way distinct is supposed to work? My intent is to give
> only one for each different value of x - like it does in the first
> distinct example. But when order by is added for the date/time sort
> I get what you see in the second distinct example.

Yeah, I think it's a bug too. It's not quite clear what to change,
though.

The "problem" is that nodeUnique is doing a bitwise compare across the
whole tuple, including the hidden ('junk') y column that is needed to do
the sorting. So, because you have four different y values, you get four
rows out.

However, if we fix nodeUnique to ignore junk columns, then the result
becomes nondeterministic. Consider

x y

1 3
1 5
2 4

If we do "select distinct x from foo order by y" on this data, then the
order of the result depends on which of the two tuples with x=1 happens
to get chosen by the Unique filter. This is not good.

SQL92 gets around this by allowing ORDER BY only on columns of the
targetlist, so that you are not allowed to specify this query in the
first place.

I think it is useful to allow ORDER BY on hidden columns, but maybe we
need to forbid it when DISTINCT is present. If we do that then the
implementation of nodeUnique is OK as it stands, and the bug is that
the parser accepts an invalid query.

This is pretty closely related to the semantic problems of DISTINCT ON,
once you see that the trouble is having columns in the query that aren't
being used for (or aren't supposed to be used for) the DISTINCT check.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-10-21 01:15:20 Re: [HACKERS] translate function (BUG?)
Previous Message Lincoln Yeoh 1999-10-21 01:08:15 Re: [GENERAL] Postgres INSERTs much slower than MySQL?