Parametrization and UNION in view

From: Ronan Dunklau <ronan(dot)dunklau(at)people-doc(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Parametrization and UNION in view
Date: 2018-11-14 14:43:33
Message-ID: CAARsnT0Am0AVQpQNHidXqqEHtJOieG4P4u9YAzWvJJ17i6eNRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello,

We've encountered a query which took forever on our database, and after
investigating why I managed to reduce the test case to something simple.

The problem is that the optimizer seems to fail to consider pushing a
predicate down a "unionized" view:

CREATE TABLE t1 AS SELECT i FROM generate_series(1, 100000) i;
CREATE INDEX ON t1 (i);
CREATE TABLE t2 AS SELECT i FROM generate_series(1, 100000) i;
CREATE INDEX ON t2 (i);
CREATE TABLE t3 AS SELECT i FROM generate_series(1, 2) i;
CREATE INDEX ON t3 (i);

CREATE VIEW v1 AS
SELECT * FROM t1
UNION
SELECT * FROM t2;

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i);
QUERY PLAN
----------------------------------------------------------------------------
Hash Join (cost=6387.05..11157.05 rows=2000 width=4)
Hash Cond: (t1.i = t3.i)
-> HashAggregate (cost=6386.00..8386.00 rows=200000 width=4)
Group Key: t1.i
-> Append (cost=0.00..5886.00 rows=200000 width=4)
-> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4)
-> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)
-> Hash (cost=1.02..1.02 rows=2 width=4)
-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4)
(9 rows)

We can see that it choses to perform a join between the full output of the
view and the really small subset.

The optimizer can be forced to select a proper plan by introducing an
offset clause:

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i
OFFSET 0);
QUERY
PLAN
----------------------------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..1966.07 rows=1 width=4)
Filter: (SubPlan 1)
SubPlan 1
-> HashAggregate (cost=982.51..992.51 rows=1000 width=4)
Group Key: t1.i
-> Append (cost=12.17..980.01 rows=1000 width=4)
-> Bitmap Heap Scan on t1 (cost=12.17..482.50 rows=500
width=4)
Recheck Cond: (i = t3.i)
-> Bitmap Index Scan on t1_i_idx (cost=0.00..12.04
rows=500 width=0)
Index Cond: (i = t3.i)
-> Bitmap Heap Scan on t2 (cost=12.17..482.50 rows=500
width=4)
Recheck Cond: (i = t3.i)
-> Bitmap Index Scan on t2_i_idx (cost=0.00..12.04
rows=500 width=0)
Index Cond: (i = t3.i)

Notice how the cost is much lower than the one of the previous plan, which
indicates to me that this plan is not even considered. If we raise the cost
of various operations (set enable_hashjoin = off, set enable_material =
off), we end up with a nonsensical nested loop:

explain SELECT * FROM t3 WHERE EXISTS (SELECT i FROM v1 WHERE v1.i = t3.i);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=6386.00..25773.02 rows=2000 width=4)
Join Filter: (t3.i = t1.i)
-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4)
-> HashAggregate (cost=6386.00..8386.00 rows=200000 width=4)
Group Key: t1.i
-> Append (cost=0.00..5886.00 rows=200000 width=4)
-> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=4)
-> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=4)

Should that be considered a bug ?

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-11-14 14:59:57 Re: [PATCH] Memory leak in pg_config
Previous Message Alvaro Herrera 2018-11-14 14:02:46 Re: pgsql: Add flag values in WAL description to all heap records