Re: Planner issue on sorting joining of two tables with limit

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner issue on sorting joining of two tables with limit
Date: 2010-05-06 06:54:01
Message-ID: l2z2e2e4b141005052354p9bd81643j866bc5ce5af3ada5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I found my mistake. My supposition is working only if value column in t1
table is unique. But if I replace the index by unique one then plan is the
same.

On Mon, May 3, 2010 at 5:57 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> Well, no, because that plan wouldn't produce the specified ordering;
>> or at least it would be a lucky coincidence if it did. It's only
>> sorting on t1.value.
>>
> I just don't find why it is coincidence. I think that such plan will always
> produce result ordered by two columns, because such nested index scan always
> produce this result.
>
> Let's consider some simple example in order to illustrate how this plan
> works.
>
> t1
> id | value
> ---+------
> 1 | 0.1
> 2 | 0.3
> 3 | 0.2
>
> t2
> id | id1 | value
> ---+-----+------
> 1 | 2 | 0.2
> 2 | 1 | 0.9
> 3 | 2 | 0.6
> 4 | 1 | 0.7
> 5 | 1 | 0.4
> 6 | 3 | 0.2
>
> 1) The outer index scan will find the row of t1 with least value using
> test1_value_idx. It will be row (1, 0.1)
> 2) The inner index scan will find all the rows in t2 where id1 = 1 using
> test2_id1_value_idx. But index test2_id1_value_idx have second order by
> value column and the index scan result will be ordered by value. That's why
> inner index scan will find rows (5, 1, 0.4), (4, 1, 0.7) and (2, 1, 0.9).
> And following query output will be produced:
>
> value1 | value2
> --------+-------
> 0.1 | 0.4
> 0.1 | 0.7
> 0.1 | 0.9
> 3) The outer index scan will find the row of t1 with the second value using
> test1_value_idx. It will be row (3, 0.2)
> 4) The inner index scan will find all the rows in t2 where id1 = 3 using
> test2_id1_value_idx. This row is (6, 3, 0.2). The following query output
> will be produced:
>
> value1 | value2
> --------+-------
> 0.2 | 0.2
> 5) The outer index scan will find the row of t1 with the third value using
> test1_value_idx. It will be row (2, 0.3)
> 6) The inner index scan will find all the rows in t2 where id1 = 2 using
> test2_id1_value_idx. These rows are (1, 2, 0.2) and (3, 2, 0.6). The
> following query output will be produced:
>
> value1 | value2
> --------+-------
> 0.3 | 0.2
> 0.3 | 0.6
>
> And the whole query result is:
> value1 | value2
> --------+-------
> 0.1 | 0.4
> 0.1 | 0.7
> 0.1 | 0.9
> 0.2 | 0.2
> 0.3 | 0.2
> 0.3 | 0.6
>
> And this result is really ordered by t1.value, t2.value.
> I can't find error in my reasoning :)
>
> The query without limit produce similar plan.
>
> EXPLAIN SELECT t1.value AS value1, t2.value AS value2 FROM test1 t1 JOIN
> test2 t2 ON t2.id1 = t1.id ORDER BY t1.value
>
> Nested Loop (cost=0.00..62109.86 rows=995025 width=16)
> -> Index Scan using test1_value_idx on test1 t1 (cost=0.00..19.19
> rows=200 width=12)
> -> Index Scan using test2_id1_value_idx on test2 t2 (cost=0.00..248.27
> rows=4975 width=12)
> Index Cond: (t2.id1 = t1.id)
>
> And I checked that the result is ordered by t1.value and t2.value.
>
> 2010/4/26 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>
>> =?KOI8-R?B?68/Sz9TLz9cg4czFy9PBzsTS?= <aekorotkov(at)gmail(dot)com> writes:
>>
>> > So PostgreSQL planner can produce the plan I need but it doesn't produce
>> > this plan when I specify particular second ordering column.
>>
>> Well, no, because that plan wouldn't produce the specified ordering;
>> or at least it would be a lucky coincidence if it did. It's only
>> sorting on t1.value.
>>
>> > So is there any
>> > way to make planner produce desired plan when particular second ordering
>> > column is specified?
>>
>> Not when the ordering columns come from two different tables. (If they
>> were in the same table then scanning a two-column index could produce
>> the demanded sort order.) I don't see any way to satisfy this query
>> without an explicit sort step, which means it has to look at the whole
>> join output.
>>
>> If you're willing to make assumptions like "the required 10 rows will be
>> within the first 100 t1.value rows" then you could nest an ORDER BY
>> t1.value LIMIT 100 query inside something that did an ORDER BY with both
>> columns. But this fails if you have too many duplicate t1.value values,
>> and your test case suggests that you might have a lot of them. In any
>> case it would stop being fast if you make the inner LIMIT very large.
>>
>> regards, tom lane
>>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tatsuo Ishii 2010-05-06 07:09:25 Re: PgPool II configuration with PostgreSQL 8.4
Previous Message Marcos Ortiz 2010-05-06 06:29:51 Re: PgPool II configuration with PostgreSQL 8.4