planner chooses unoptimal plan on joins with complex key

From: "Dmitry Potapov" <fortune(dot)fish(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: planner chooses unoptimal plan on joins with complex key
Date: 2008-01-23 13:48:16
Message-ID: 878c83960801230548i6bee9353m19bcee7335220c9b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I've got two huge tables with one-to-many relationship with complex
key. There's also a view, which JOINs the tables, and planner chooses
unoptimal plan on SELECTs from this view.

The db schema is declared as: (from on now, I skip the unsignificant
columns for the sake of simplicity)

CREATE TABLE t1 (
id integer NOT NULL,
m1 integer NOT NULL DEFAULT 0,
m2 bigint NOT NULL DEFAULT 0,
m3 bigint NOT NULL DEFAULT 0,
time_stamp timestamp without time zone DEFAULT now() NOT NULL,
[...skipped...]
);

CREATE TABLE t2 (
id integer NOT NULL,
m1 integer NOT NULL DEFAULT 0,
m2 bigint NOT NULL DEFAULT 0,
m3 bigint NOT NULL DEFAULT 0,
time_stamp timestamp without time zone DEFAULT now() NOT NULL,
[...skipped...]
);

CREATE VIEW t1t2_view AS SELECT ..., t1.m1, t1.m2, t1.m3,
t1.time_stamp FROM t1 JOIN t2 on ( (t1.m1=t2.m1) AND (t1.m2=t2.m2)
AND (t1.m3=t2.m3));

CREATE UNIQUE INDEX i_t1_ms ON t1(m1,m2,m3);
CREATE INDEX i_t1_ts ON t1(time_stamp);
CREATE INDEX i_t2_ms ON t2(m1,m2,m3);

Table t1 contains ~20M rows, t2 contains ~30M rows. The complex key
that ties one table to another is implied, i.e. (m1,m2,m3) isn't
declared as foreign key. There's a reason for that: an app needs to
push lots of INSERTs to these tables pretty quickly, and additional
foreign key constraint check will kill the performance.

So, here's the query in question:

SELECT * FROM t1t2_view ORDER BY time_stamp ASC LIMIT 100;

EXPLAIN ANALYZE SELECT * FROM t1t2_view ORDER BY time_stamp ASC LIMIT 100:

Limit (cost=13403340.40..13403340.40 rows=1 width=152)
-> Sort (cost=13403340.40..13403340.40 rows=1 width=152)
Sort Key: t1.time_stamp
-> Merge Join (cost=6663466.28..13403340.39 rows=1 width=152)
Merge Cond: ((t1.m1 = t2.m1) AND (t1.m2 = t2.m2) AND
(t1.m3 = t2.m3))
-> Index Scan using i_t1_ms on t1
(cost=0.00..6272009.52 rows=21639880 width=121)
-> Sort (cost=6663466.28..6739884.33 rows=30567222 width=51)
Sort Key: t2.m1, t2.m2, t2.m3
-> Seq Scan on t2 (cost=0.00..922814.22
rows=30567222 width=51)

When I set enable_sort and enable_mergejoin to off, the planner
chooses better plan:

EXPLAIN ANALYZE SELECT * FROM t1t2_view ORDER BY time_stamp ASC LIMIT 100

Limit (cost=0.00..175299576.86 rows=1 width=152)
-> Nested Loop (cost=0.00..175299576.86 rows=1 width=152)
-> Index Scan using i_t1_ts on t1 (cost=0.00..1106505.70
rows=21642342 width=121)
-> Index Scan using i_t2_ms on t2 (cost=0.00..8.03 rows=1 width=51)
Index Cond: ((t1.m1 = t2.m1) AND (t1.m2 = t2.m2) AND
(t1.m3 = t2.m3))

The problem here is, as far as I understand, is the wrong estimate of
row count in join result table.

Postgresql version is 8.2.5. The tables are ANALYZEd, Changing
default_statistics_target from 10 to 100, and even 300 doesn't affect
planner's behaviour.

Is there any possibility to make the planner to choose an optimal plan
without turning off enable_sort and enable_mergejoin?

Thanks in advance.

--
Regards,
Dmitry

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2008-01-23 13:58:52 Re: planner chooses unoptimal plan on joins with complex key
Previous Message Guillaume Smet 2008-01-23 12:36:44 *_cost recommendation with 8.3 and a fully cached db