Clamping reulst row number of joins.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Clamping reulst row number of joins.
Date: 2015-03-06 08:33:41
Message-ID: 20150306.173341.188111041.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.

Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.

EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

0> Nested Loop (cost=0.43..8.51 rows=68732 width=4)
1> -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
2> -> Append (cost=0.43..8.48 rows=2 width=4)
3> -> Index Only Scan using i_t1_a on t1
4> (cost=0.43..8.46 rows=1 width=4)
5> Index Cond: (a = "*VALUES*".column1)
6> -> Subquery Scan on "*SELECT* 2" (cost=0.00..0.02 rows=1 width=4)
7> Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
8> -> Result (cost=0.00..0.01 rows=1 width=0)

The Nested Loop at line 0 says that it emits 68832 rows even
though the underlying nodes, Values at line 1 and Append at line
2 are giving 1 and 2 respectively as their reults. This query
actually returns only 1 row. It is seemingly wrong and causes the
plans above go wrong.

This is caused by the Append node, which don't have statistics,
so eqjoinsel takes 0.5 as the default selectivity for the nested
loop. Even though it is defficult to get statistics for Append
node, the nested loop could get more sane row nubmer if it
doesn't ignore the parameterized path selected for the scan on
t1.

I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.

> Nested Loop (cost=0.43..16.51 rows=2 width=4)
> -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
> -> Append (cost=0.43..16.48 rows=2 width=4)
> -> Index Only Scan using i_t1_a on t1
> (cost=0.43..16.45 rows=1 width=4)
> Index Cond: (a = "*VALUES*".column1)
> -> Subquery Scan on "*SELECT* 2" (cost=0.00..0.02 rows=1 width=4)
> Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
> -> Result (cost=0.00..0.01 rows=1 width=0)

Is it correct? And Does it have no bad side effects?
And is this applicable for 9.5?

The detailed test environment is described below.

Before this fix, the inner path of the top-level join is doing
hash because the inner path says that it will give 68732
rows. (But only 1 actually).

After this fix, the outer path declaring that it gives 2 rows so
the top-level join becomes nested loop and the inner path becomes
index scan.

=======
CREATE TABLE t1 (a int);
INSERT INTO t1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_t1_a ON t1 (a);
CREATE TABLE t2 AS SELECT * from t1;
ANALYZE t1;
ANALYZE t2;
-- emulating the problematic situation
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;

EXPLAIN
SELECT tt2.a
FROM (SELECT a
FROM (SELECT a FROM t1
UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a) tt2
JOIN t2 ON (tt2.a = t2.a);

======== The plan before fix
Hash Join (cost=30832.46..36230.60 rows=68732 width=4)
(actual time=1258.880..1260.519 rows=1 loops=1)
Hash Cond: ("*VALUES*".column1 = t2.a)
-> Nested Loop (cost=0.43..8.51 rows=68732 width=8)
(actual time=0.071..0.104 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
(actual time=0.002..0.003 rows=1 loops=1)
-> Append (cost=0.43..8.48 rows=2 width=4)
(actual time=0.063..0.093 rows=1 loops=1)
-> Index Only Scan using i_t1_a on t1
(cost=0.43..8.46 rows=1 width=4)
(actual time=0.059..0.075 rows=1 loops=1)
Index Cond: (a = "*VALUES*".column1)
Heap Fetches: 2
-> Subquery Scan on "*SELECT* 2"
(cost=0.00..0.02 rows=1 width=4)
(actual time=0.010..0.010 rows=0 loops=1)
Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
Rows Removed by Filter: 1
-> Result (cost=0.00..0.01 rows=1 width=0)
(actual time=0.001..0.002 rows=1 loops=1)
-> Hash (cost=14425.01..14425.01 rows=1000001 width=4)
(actual time=1250.274..1250.274 rows=1000001 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3227kB
-> Seq Scan on t2 (cost=0.00..14425.01 rows=1000001 width=4)
(actual time=0.021..608.245 rows=1000001 loops=1)
Planning time: 0.319 ms
Execution time: 1260.792 ms

======== The plan after fix
Nested Loop (cost=0.86..17.43 rows=2 width=4)
(actual time=0.103..0.118 rows=1 loops=1)
Join Filter: ("*VALUES*".column1 = t2.a)
-> Nested Loop (cost=0.43..16.51 rows=2 width=8)
(actual time=0.062..0.073 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
(actual time=0.003..0.004 rows=1 loops=1)
-> Append (cost=0.43..16.48 rows=2 width=4)
(actual time=0.051..0.060 rows=1 loops=1)
-> Index Only Scan using i_t1_a on t1
(cost=0.43..16.45 rows=1 width=4)
(actual time=0.045..0.046 rows=1 loops=1)
Index Cond: (a = "*VALUES*".column1)
Heap Fetches: 1
-> Subquery Scan on "*SELECT* 2"
(cost=0.00..0.02 rows=1 width=4)
(actual time=0.005..0.005 rows=0 loops=1)
Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
Rows Removed by Filter: 1
-> Result (cost=0.00..0.01 rows=1 width=0)
(actual time=0.002..0.002 rows=1 loops=1)
-> Index Only Scan using i_t2_a on t2 (cost=0.42..0.45 rows=1 width=4)
(actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (a = t1.a)
Heap Fetches: 1
Planning time: 0.968 ms
Execution time: 0.239 ms
=========

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Clamp-row-number-of-join-product-by-the-row-number-c.patch text/x-patch 2.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2015-03-06 09:04:30 Re: INSERT ... ON CONFLICT UPDATE and RLS
Previous Message Noah Misch 2015-03-06 08:33:28 Re: Rethinking pg_dump's function sorting code