Re: Pushing IN (subquery) down through UNION ALL?

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Pushing IN (subquery) down through UNION ALL?
Date: 2011-02-24 16:38:56
Message-ID: AANLkTime=f7bSHKueDYXJQVOn+D_c4XSoE9n358fahnf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Feb 24, 2011 at 8:14 AM, Dave Johansen <davejohansen(at)gmail(dot)com>wrote:

> I'm using PostgreSQL 8.3.3 and I have a view that does a UNION ALL on two
> joins and it doesn't seem to want to push the IN (subquery) optimization
> down into the plan for the two queries being unioned. Is there something I
> can do to fix this? Or is it just a limitation of the planner/optimizer?
>
> I also tried this with 8.4.7 and it seemed to exhibit the same behaviour,
> so here's an example of what I'm talking about (obviously in a real system
> I'd have indexes and all that other fun stuff):
>
> CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT);
> CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
> CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
> CREATE VIEW addressesall AS SELECT u.id, u.name, a.value FROM addresses1
> AS a JOIN users AS u ON a.userid=u.id UNION ALL SELECT u.id, u.name,
> a.value FROM addresses2 AS a JOIN users AS u ON a.userid=u.id;
>
>
> Here's the EXPLAIN output for two example queries:
>
> test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT id
> FROM users WHERE name='A');
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------
> Hash Semi Join (cost=2.15..5.58 rows=1 width=40) (actual
> time=0.144..0.340 rows=3 loops=1)
> Hash Cond: (u.id = users.id)
> -> Append (cost=1.09..4.48 rows=9 width=40) (actual time=0.059..0.239
> rows=9 loops=1)
> -> Hash Join (cost=1.09..2.19 rows=4 width=10) (actual
> time=0.055..0.075 rows=4 loops=1)
> Hash Cond: (a.userid = u.id)
> -> Seq Scan on addresses1 a (cost=0.00..1.04 rows=4
> width=8) (actual time=0.006..0.013 rows=4 loops=1)
> -> Hash (cost=1.04..1.04 rows=4 width=6) (actual
> time=0.019..0.019 rows=4 loops=1)
> -> Seq Scan on users u (cost=0.00..1.04 rows=4
> width=6) (actual time=0.003..0.008 rows=4 loops=1)
> -> Hash Join (cost=1.09..2.21 rows=5 width=10) (actual
> time=0.109..0.133 rows=5 loops=1)
> Hash Cond: (a.userid = u.id)
> -> Seq Scan on addresses2 a (cost=0.00..1.05 rows=5
> width=8) (actual time=0.004..0.012 rows=5 loops=1)
> -> Hash (cost=1.04..1.04 rows=4 width=6) (actual
> time=0.020..0.020 rows=4 loops=1)
> -> Seq Scan on users u (cost=0.00..1.04 rows=4
> width=6) (actual time=0.004..0.010 rows=4 loops=1)
> -> Hash (cost=1.05..1.05 rows=1 width=4) (actual time=0.053..0.053
> rows=1 loops=1)
> -> Seq Scan on users (cost=0.00..1.05 rows=1 width=4) (actual
> time=0.032..0.040 rows=1 loops=1)
> Filter: (name = 'A'::text)
> Total runtime: 0.519 ms
> (17 rows)
>
> test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (1);
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------
> Result (cost=0.00..4.27 rows=3 width=40) (actual time=0.053..0.114 rows=3
> loops=1)
> -> Append (cost=0.00..4.27 rows=3 width=40) (actual time=0.049..0.101
> rows=3 loops=1)
> -> Nested Loop (cost=0.00..2.12 rows=2 width=10) (actual
> time=0.046..0.063 rows=2 loops=1)
> -> Seq Scan on users u (cost=0.00..1.05 rows=1 width=6)
> (actual time=0.025..0.028 rows=1 loops=1)
> Filter: (id = 1)
> -> Seq Scan on addresses1 a (cost=0.00..1.05 rows=2
> width=8) (actual time=0.009..0.017 rows=2 loops=1)
> Filter: (a.userid = 1)
> -> Nested Loop (cost=0.00..2.12 rows=1 width=10) (actual
> time=0.015..0.025 rows=1 loops=1)
> -> Seq Scan on addresses2 a (cost=0.00..1.06 rows=1
> width=8) (actual time=0.005..0.008 rows=1 loops=1)
> Filter: (userid = 1)
> -> Seq Scan on users u (cost=0.00..1.05 rows=1 width=6)
> (actual time=0.004..0.007 rows=1 loops=1)
> Filter: (u.id = 1)
> Total runtime: 0.251 ms
> (13 rows)
>
> You'll notice that the subquery version is doing the full join and then the
> filtering, but the explicitly listed version pushing the filtering into the
> plan before the join. Is there a way to make the subquery version perform
> the same optimization?
>
> Thanks,
> Dave
>

I also just noticed that an ORDER BY x LIMIT n optimization is not pushed
down through the UNION ALL as well. I understand that this may be a little
trickier because the ORDER BY and LIMIT would need to be applied to the
subqueries and then re-applied after the APPEND, but is there some way to
get either the previous issue or this issue to optimize as desired? Or do I
just need to change my schema to not use two separate tables with a VIEW and
a UNION ALL?

Thanks again,
Dave

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-02-24 18:13:12 Re: Unused indices
Previous Message Tom Lane 2011-02-24 15:58:53 Re: Function execution consuming lot of memory and eventually making server unresponsive