Bogus nestloop rows estimate in 8.4.7

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Bogus nestloop rows estimate in 8.4.7
Date: 2012-05-28 07:45:06
Message-ID: CABRT9RARmcPq-2_Q84xXcp7JCND0SdkGBwRB4t1oX5rdXTZNKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi list,

This bug isn't causing me any immediate problems -- the plan works out
well regardless -- but PostgreSQL 8.4.7 is somehow overestimating the
number of rows coming from a nestloop join, when joining 2 large
partitioned tables. Maybe it's been fixed in more recent versions,
sadly it's an EOL Linux distro and I have no immediate plans to
upgrade.

It's estimating to join 135957 x 281 rows, but the product is somehow
2946151270877

In reality, it's joining 132577 x ~1 rows to get 133116 results

QUERY PLAN
GroupAggregate (cost=852067259163.57..977278688175.85
rows=2946151270877 width=36)
-> Sort (cost=852067259163.57..859432637340.77 rows=2946151270877 width=36)
Sort Key: b.banner_id, b.client_body_id,
(COALESCE(b.partner_body_id, a.partner_body_id)), b.space_id,
b.campaign_id, a.evt_type_id
-> Nested Loop (cost=0.00..213859871.55 rows=2946151270877 width=36)
Join Filter: (a.request_id = b.request_id)
-> Append (cost=0.00..5905.69 rows=135957 width=20)
-> Index Scan using "XIF01request" on request a
(cost=0.00..8.27 rows=1 width=20)
Index Cond: ((request_time >= '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time <
'2012-05-28 10:00:00'::timestamp without time zone))
-> Index Scan using "XIF01request_1222" on
request_1222 a (cost=0.00..5897.42 rows=135956 width=20)
Index Cond: ((request_time >= '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time <
'2012-05-28 10:00:00'::timestamp without time zone))
-> Append (cost=0.00..1569.44 rows=281 width=32)
-> Seq Scan on request_data b (cost=0.00..11.30
rows=130 width=32)
-> Index Scan using
"IX_relationship64_request_d_c_1150" on request_d_c_1150 b
(cost=0.00..9.56 rows=2 width=32)
Index Cond: (b.request_id = a.request_id)
*snip lots of partition index scans*

Query:
SELECT '2012-05-28T09:00:00', count(*),
uniq(sort(array_agg(visitor_id))), banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id FROM stats_request
WHERE stats_request.request_time >= '2012-05-28T09:00:00' AND
stats_request.request_time < (timestamp '2012-05-28T09:00:00' +
interval E'1 hour')::timestamp
GROUP BY banner_id, client_body_id, partner_body_id, space_id,
campaign_id, evt_type_id ORDER BY banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id;

Full EXPLAIN ANALYZE is attached.

Regards,
Marti

Attachment Content-Type Size
explain-analyze.txt text/plain 36.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2012-05-28 08:18:22 Re: Bogus nestloop rows estimate in 8.4.7
Previous Message Sandro Santilli 2012-05-28 06:48:21 Re: Interrupting long external library calls