Re: Preserving order through an inner join

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Kevin Jardine <kevinjardine(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Preserving order through an inner join
Date: 2010-09-28 04:28:24
Message-ID: 4CA16EE8.5040406@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 09/27/2010 03:37 AM, Tom Lane wrote:
> Kevin Jardine<kevinjardine(at)yahoo(dot)com> writes:
>> I have a query structured like this:
>> SELECT stuff FROM
>> (SELECT more stuff FROM
>> table1
>> ORDER BY field1) AS q1
>> INNER JOIN table2 ON ( ... )
>
>> and have found that the INNER JOIN is ignoring the order set for q1.
>
>> The final results are not ordered by field1.
>
> Indeed. Many of the possible join techniques won't preserve that ordering.

Just to elaborate on this for the OP:

When joining two data sets, PostgreSQL has a number of choices about how
to do it. Some of these are:

- Merge join (http://en.wikipedia.org/wiki/Sort-merge_join)
- Hash join (http://en.wikipedia.org/wiki/Hash_join)
- Nested loop + index/sequential table scans
(http://en.wikipedia.org/wiki/Nested_loop_join)

As you will see from the descriptions and how they work, many join
algorithms do not preserve the order of the input relations. Different
join types are optimal for different absolute and relative sizes of
input relations and different join conditions; all of them are useful in
one situation or another. Only using joins that preserved the natural
order of the relations (which is undefined as per the SQL spec and 99%
of the time people don't care about anyway) would be pretty nasty for
performance.

You could potentially force preservation of order in your particular
query by telling PostgreSQL to use a nested loop with your subquery as
the outer loop, but it'd require overriding the query planner's join
method and join order optimization in ugly ways that aren't well supported.

Not only that, but you'll have to do that kind of hacking-around if you
target any other database that supports at least hash joins, and you'll
have to do it using database-specific query hints, configuration
options, session variables, etc.

I hope this helps to explain in a bit more detail why it's worth putting
in the effort to update your code.

--
Craig ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2010-09-28 04:38:46 Re: Merge replication with Postgresql on Windows?
Previous Message Satoshi Nagayasu 2010-09-28 04:06:19 Re: ECPG - Some errno definitions don't match to the manual