EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Date: 2016-06-21 20:18:06
Message-ID: CAHyXU0yTy1Q7Qe-rozHfejkTYg0LHWkHT7k7A+3H=xmjbs6PmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

Observe the following test case (apologies if this is a well
understood problem):

create temp table foo as select generate_series(1,1000000) id;
create index on foo(id);

create temp table bar as select id, id % 100000 = 0 as good from
generate_series(1,1000000) id;
create index on bar(good);

analyze foo;
analyze bar;

explain analyze select * from foo where false or exists (select 1 from
bar where good and foo.id = bar.id); -- A
explain analyze select * from foo where exists (select 1 from bar
where good and foo.id = bar.id); -- B

These queries are trivially verified as identical but give very different plans.
A gives
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Seq Scan on foo (cost=0.00..4459425.00 rows=500000 width=4) (actual
time=13.299..130.271 rows=10 loops=1)
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
Rows Removed by Filter: 999990
SubPlan 1
-> Index Scan using bar_good_idx on bar (cost=0.42..4.45 rows=1
width=0) (never executed)
Index Cond: (good = true)
Filter: (good AND (foo.id = id))
SubPlan 2
-> Index Scan using bar_good_idx on bar bar_1 (cost=0.42..4.44
rows=1 width=4) (actual time=0.024..0.055 rows=10 loops=1)
Index Cond: (good = true)
Filter: good
Planning time: 0.103 ms
Execution time: 130.312 ms

B gives
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=4.87..12.91 rows=1 width=4) (actual
time=0.075..0.161 rows=10 loops=1)
-> HashAggregate (cost=4.45..4.46 rows=1 width=4) (actual
time=0.058..0.060 rows=10 loops=1)
Group Key: bar.id
-> Index Scan using bar_good_idx on bar (cost=0.42..4.44
rows=1 width=4) (actual time=0.018..0.045 rows=10 loops=1)
Index Cond: (good = true)
Filter: good
-> Index Only Scan using foo_id_idx on foo (cost=0.42..8.44
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=10)
Index Cond: (id = bar.id)
Heap Fetches: 10
Planning time: 0.193 ms
Execution time: 0.187 ms

This is a general problem to OR expressions while AND expressions will
generally pass the optimization through. The 'old school'
optimization approach is to rewrite the OR expressions to UNION ALL
but this can have unpleasant downstream effects on the query in real
world scenarios. The question is: can the one time filter logic be
expanded such the first query can be functionally be written into the
second one? This type of query happens a lot when trying to mix
multiple different filtering expressions (a 'filter mode' if you will)
in a single query based on a user supplied switch. Food for thought.

merlin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-06-21 20:32:03 Re: Reviewing freeze map code
Previous Message Alvaro Herrera 2016-06-21 20:09:22 Re: MultiXactId error after upgrade to 9.3.4