Planner producing 100% duplicate subplans when unneeded

From: Daniel Grace <dgrace(at)wingsnw(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Planner producing 100% duplicate subplans when unneeded
Date: 2010-09-27 21:09:14
Message-ID: AANLkTikU6k1BgcMBcgkgOGNbSxq-juHtKMPRJP5BxS9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

This is all on Postgres 9.0.0:

I'm working on the definition for a view that, as part of its output,
includes three columns that each contain a sum of values in a table
that are in one of a few different states -- essentially: for each
parent, give me the sum of children in the foo state, a second sum of
the children in the bar state, and a third sum of the children in the
baz state.

Thinking I'd be clever and avoid multiple almost-identical subqueries
to for each SUM, I decided to do it in a single subquery that returns
an array of all 3 sums and then split it out into its component parts
higher up (see the example below if you don't understand what I mean).
Unfortunately, the query planner doesn't quite handle this case: For
each reference to the subquery value, it duplicates the subquery and
thus its plan:

CREATE TABLE parent (
id INT PRIMARY KEY
);
INSERT INTO parent(id) SELECT GENERATE_SERIES(1, 1000);

CREATE TABLE child (
parent_id INT,
v1 INT,
v2 INT,
PRIMARY KEY(parent_id, v1)
);

INSERT INTO child(parent_id, v1, v2)
SELECT p.id, v1.id, RANDOM() * 500
FROM
GENERATE_SERIES(1, 1000) AS p(id)
CROSS JOIN GENERATE_SERIES(1, 20) AS v1(id)
;

EXPLAIN SELECT *, child_data[1] AS foo, child_data[2] AS bar,
child_data[3] AS baz FROM (
SELECT p.id,
(
SELECT
ARRAY[
SUM(c.v2),
SUM(CASE WHEN c.v1 > 15 THEN c.v2 ELSE 0 END),
SUM(CASE WHEN c.v1 > 5 THEN c.v2 ELSE 0 END)
]
FROM child AS c
WHERE c.parent_id=p.id
) AS child_data
FROM parent AS p
) AS t;

produces:

Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4)
Output: p.id, (SubPlan 1), ((SubPlan 2))[1], ((SubPlan 3))[2], ((SubPlan 4))[3]
SubPlan 1
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 2
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 3
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 4
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)

There's another method of writing this that is more efficient in this
PARTICULAR case:

EXPLAIN VERBOSE SELECT parent.id, t.foo, t.bar, t.baz
FROM
parent
INNER JOIN (
SELECT
child.parent_id,
SUM(child.v2) AS foo,
SUM(CASE WHEN child.v1 > 15 THEN child.v2 ELSE 0 END) AS bar,
SUM(CASE WHEN child.v1 > 5 THEN child.v2 ELSE 0 END) AS baz
FROM
child
GROUP BY
child.parent_id
) AS t ON t.parent_id=parent.id

Hash Join (cost=547.12..556.62 rows=200 width=28)
Output: parent.id, (sum(child.v2)), (sum(CASE WHEN (child.v1 > 15)
THEN child.v2 ELSE 0 END)), (sum(CASE WHEN (child.v1 > 5) THEN
child.v2 ELSE 0 END))
Hash Cond: (child.parent_id = parent.id)
-> HashAggregate (cost=483.12..487.62 rows=200 width=12)
Output: child.parent_id, sum(child.v2), sum(CASE WHEN (child.v1 > 15)
THEN child.v2 ELSE 0 END), sum(CASE WHEN (child.v1 > 5) THEN child.v2
ELSE 0 END)
-> Seq Scan on wings_sky.child (cost=0.00..291.06 rows=19206 width=12)
Output: child.parent_id, child.v1, child.v2
-> Hash (cost=34.00..34.00 rows=2400 width=4)
Output: parent.id
-> Seq Scan on wings_sky.parent (cost=0.00..34.00 rows=2400 width=4)
Output: parent.id

However, the second plan performs poorly in cases in the case where
parent_id can be more than one possible value (i.e. there is no WHERE
parent_id=1 clause) -- it takes nearly as long in that instance as it
does with no WHERE clause altogether. Both plans run the same speed
with one parent_id. The first plan starts losing speed gradually as
the number of parents increase; the second plan is either
all-or-nothing.

In the first case, it seems inefficient to duplicate the subplan for
each reference -- I'd think the (corrected) plan should look something
like this:

Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4)
Output: p.id, (SubPlan 1), ((SubPlan 1))[1], ((SubPlan 1))[2], ((SubPlan 1))[3]
SubPlan 1
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)

Is there any chance this might be looked at in a future release?

--
Daniel Grace
AGE, LLC
System Administrator and Software Developer
dgrace(at)wingsnw(dot)com // www.wingsnw.com

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Samuel Gendler 2010-09-27 23:01:53 Re: [BUGS] Mapping Hibernate boolean to smallint(Postgresql)
Previous Message Alexey Parshin 2010-09-27 20:56:01 BUG #5680: Failure to start: too many private dirs demanded