BUG #5885: Strange rows estimation for left join

From: "Maxim Boguk" <Maxim(dot)Boguk(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5885: Strange rows estimation for left join
Date: 2011-02-15 02:43:17
Message-ID: 201102150243.p1F2hHge058708@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5885
Logged by: Maxim Boguk
Email address: Maxim(dot)Boguk(at)gmail(dot)com
PostgreSQL version: 8.4.4
Operating system: Linux
Description: Strange rows estimation for left join
Details:

I found that strange effect while helping with slow query on russian
postgresql online forum.

To be short results looks like:

explain analyze
select *
from references rs
left join references vm on vm.reference = rs.reference and vm.attr_id =
20084
where rs.object_id = 9129863193713091717;

"Nested Loop Left Join (cost=0.00..1567.71 rows=2129718 width=112) (actual
time=14.654..14.672 rows=2 loops=1)"
" -> Index Scan using xif02references on references rs (cost=0.00..19.85
rows=11 width=56) (actual time=0.019..0.020 rows=2 loops=1)"
" Index Cond: (object_id = 9129863193713091717::numeric)"
" -> Index Scan using xif01references on references vm (cost=0.00..140.03
rows=55 width=56) (actual time=7.321..7.321 rows=0 loops=2)"
" Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id =
20084::numeric))"
"Total runtime: 14.758 ms"

E.g. 6 orders of magnitude error in result rows selectivity (2129718 vs 2).

Table description:

CREATE TABLE references
(
attr_id numeric(20,0) NOT NULL,
reference numeric(20,0) NOT NULL,
object_id numeric(20,0) NOT NULL
);

CREATE INDEX xif01references
ON references
USING btree
(reference, attr_id);

CREATE INDEX xif02references
ON references
USING btree
(object_id, attr_id, reference);

Yes again unfortunate EAV model.

Related data from pg_stats

"schemaname" "public"
"tablename" "references"

"attname" "attr_id"
"null_frac" 0
"avg_width" 10
"n_distinct" 23
"most_common_vals"
"{20084;20085;9127479646713033746;11;14;15;9127479093313033036;457955;91273
06545213993525}"
"most_common_freqs" "{0.234067;0.233433;0.227433;0.0998333;0.0476667;0.0449;
0.0361;0.0273333;0.0247}"

d"attname" "reference"
"null_frac" 0
"avg_width" 14
"n_distinct" 56252
"most_common_vals"
"{9129830887313872119;9129830887313872121;9129787365613945251;9129676282313
943149;24332313945759;...}"
"most_common_freqs" "{0.2497;0.0138333;0.0002;0.000166667;0.000166667;...}"

"attname" "object_id"
"null_frac" 0
"avg_width" 15
"n_distinct" 1.23744e+06
"most_common_vals"
"{9129846527513534962;9129846051413838763;9129846154413978095;9129846403613
302858;9129846025513792413;...}"
"most_common_freqs" "{0.0003;0.000266667;0.000266667;0.000233333;0.0002;0.00
02;...}"

What seems very strange for me is final esctimation 2M when estimated 11
rows on right side of the join and 55 rows on left side of the join (so
final rows must be 55*11 ~ 600 rows even with worst case of left join
selectivity=1).

May be that effect already fixed in 8.4.5+ (but I had readed changes list
since 8.4.4 and I don't find any related changes).

PS: Sorry if I completely missunderstood of planner mechanics in left join
rows estimation.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2011-02-15 03:27:31 Re: BUG #5885: Strange rows estimation for left join
Previous Message Pavel Stehule 2011-02-14 17:43:47 Re: BUG #5884: i cant select entire array values