Planner issue on sorting joining of two tables with limit

From: Коротков Александр <aekorotkov(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner issue on sorting joining of two tables with limit
Date: 2010-04-25 18:22:29
Message-ID: k2j2e2e4b141004251122u7adab639ve203afe7a486673f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello, everybody!

I'm using PostgreSQL 8.4.3, compiled by Visual C++ build 1400, 32-bit on
Windows XP SP3.
I use following data model for issue reproducing.

CREATE TABLE test1
(
id integer NOT NULL,
"value" double precision,
CONSTRAINT test1_pkey PRIMARY KEY (id)
);

CREATE INDEX test1_value_idx ON test1(value);

CREATE TABLE test2
(
id integer NOT NULL,
id1 integer NOT NULL REFERENCES test2 (id),
"value" double precision,
CONSTRAINT test2_pkey PRIMARY KEY (id)
);

CREATE INDEX test2_id1_value_idx ON test2(id1, value);

Following statements generate 200 rows of test data into test1 table and
1000000 rows of test data into test2 table.

INSERT INTO test1 (id, value) (SELECT x, random() from
generate_series(1,200) x);

INSERT INTO test2 (id, id1, value) (SELECT x, (random()*200)::int + 1,
random() from generate_series(1,1000000) x);

The following statements return top 10 rows from joining of two tables
ordered by table1.value and table2.value. The

SELECT t1.value AS value1, t2.value AS value2 FROM test1 t1 JOIN test2 t2 ON
t2.id1 = t1.id ORDER BY t1.value, t2.value
LIMIT 10

value1 | value2
---------------------+---------------------
0.00562104489654303 | 0.00039379671216011
0.00562104489654303 | 0.000658359378576279
0.00562104489654303 | 0.000668979249894619
0.00562104489654303 | 0.000768951140344143
0.00562104489654303 | 0.00121330376714468
0.00562104489654303 | 0.00122168939560652
0.00562104489654303 | 0.00124016962945461
0.00562104489654303 | 0.00134057039394975
0.00562104489654303 | 0.00169069319963455
0.00562104489654303 | 0.00171623658388853
(10 rows)

The statement plan doesn't use indexes. So the statement it is slow.

Limit (cost=50614.88..50614.91 rows=10 width=16) (actual
time=8388.331..8388.382 rows=10 loops=1)
-> Sort (cost=50614.88..53102.45 rows=995025 width=16) (actual
time=8388.324..8388.340 rows=10 loops=1)
Sort Key: t1.value, t2.value
Sort Method: top-N heapsort Memory: 17kB
-> Hash Join (cost=6.50..29112.75 rows=995025 width=16) (actual
time=0.982..6290.516 rows=997461 loops=1)
Hash Cond: (t2.id1 = t1.id)
-> Seq Scan on test2 t2 (cost=0.00..15406.00 rows=1000000
width=12) (actualtime=0.088..2047.910 rows=1000000 loops=1)
-> Hash (cost=4.00..4.00 rows=200 width=12) (actual
time=0.870..0.870 rows=200 loops=1)
-> Seq Scan on test1 t1 (cost=0.00..4.00 rows=200
width=12) (actual time=0.010..0.428 rows=200 loops=1)
Total runtime: 8388.473 ms

I can remove ordering by test2.value.

SELECT t1.value AS value1, t2.value AS value2 FROM test1 t1 JOIN test2 t2 ON
t2.id1 = t1.id ORDER BY t1.value LIMIT 10

Then the result is the same.

value1 | value2
---------------------+---------------------
0.00562104489654303 | 0.00039379671216011
0.00562104489654303 | 0.000658359378576279
0.00562104489654303 | 0.000668979249894619
0.00562104489654303 | 0.000768951140344143
0.00562104489654303 | 0.00121330376714468
0.00562104489654303 | 0.00122168939560652
0.00562104489654303 | 0.00124016962945461
0.00562104489654303 | 0.00134057039394975
0.00562104489654303 | 0.00169069319963455
0.00562104489654303 | 0.00171623658388853
(10 rows)

The statement plan uses indexes and statement runs fast. This plan is
exactly what I need.

Limit (cost=0.00..0.62 rows=10 width=16) (actual time=0.049..0.148 rows=10
loops=1)
-> Nested Loop (cost=0.00..62109.86 rows=995025 width=16) (actual
time=0.044..0.107 rows=10 loops=1)
-> Index Scan using test1_value_idx on test1 t1 (cost=0.00..19.19
rows=200 width=12) (actual time=0.017..0.017 rows=1 loops=1)
-> Index Scan using test2_id1_value_idx on test2 t2
(cost=0.00..248.27 rows=4975 width=12) (actual time=0.013..0.042 rows=10
loops=1)
Index Cond: (t2.id1 = t1.id)
Total runtime: 0.224 ms

So PostgreSQL planner can produce the plan I need but it doesn't produce
this plan when I specify particular second ordering column. So is there any
way to make planner produce desired plan when particular second ordering
column is specified?

With best regards,
Korotkov Alexander.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vlad Arkhipov 2010-04-26 02:52:23 Re: Optimization idea
Previous Message Merlin Moncure 2010-04-24 15:20:12 Re: Replacing Cursors with Temporary Tables