Re: IS NOT NULL and LEFT JOIN

From: Laurent Martelli <laurent(dot)martelli(at)enercoop(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: IS NOT NULL and LEFT JOIN
Date: 2014-10-20 09:29:35
Message-ID: 5444D5FF.5070407@enercoop.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi David,

Do we agree that both queries are identical ? Since we join on
c.user_info=u.id <http://u.id> having u.id <http://u.id> is not null or
c.user_info is not null in the where clause is the same, isn't it ?

Since c.user_info=u.id <http://u.id> the condition onu.id is not null
does not use any *new* information from user_user_info.

Regards,
Laurent

Le 19/10/2014 10:41, David Rowley a écrit :
> On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli
> <laurent(dot)martelli(at)enercoop(dot)org <mailto:laurent(dot)martelli(at)enercoop(dot)org>>
> wrote:
>
> Hello there,
>
> I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.
>
> I grant you that the query can be written without the JOIN on
> user_user_info,
> but it is generated like this by hibernate. Just changing the IS
> NOT NULL condition
> to the other side of useless JOIN makes a big difference in the
> query plan :
>
> -- THE BAD ONE : given the selectivity on c.name <http://c.name>
> and c.email, barely more than one row will ever be returned
>
>
> But it looks like you're ignoring the fact that the OR condition would
> force the query to match not only the user and the email, but also any
> row that finds a match in the user_user_info table, which going by the
> planner's estimates, that's every row in the contract_contract table.
> This is why the planner chooses a seqscan on the contract_contract
> table instead of using the index on lower(name).
>
> Is it really your intention to get all rows that find a this martelli
> contract that has this email, and along with that, get every contract
> that has a not null user_info record?
>
> I see that you have a foreign key on c.user_info to reference the
> user, so this should be matching everything with a non null user_info
> record.
>
> explain analyze select c.*
> from contact_contact c
> left outer join user_user_info u on c.user_info=u.id
> <http://u.id>
> left outer join contact_address a on c.address=a.id
> <http://a.id>
> where lower(c.name <http://c.name>)='martelli'
> and c.email='dsfze(at)ezrfz(dot)com <mailto:dsfze(at)ezrfz(dot)com>' or
> u.id <http://u.id> is not null;
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------
> Hash Left Join (cost=1.83..2246.76 rows=59412 width=4012)
> (actual time=53.645..53.645 rows=0 loops=1)
> Hash Cond: (c.user_info = u.id <http://u.id>)
> Filter: (((lower((c.name <http://c.name>)::text) =
> 'martelli'::text) AND ((c.email)::text = 'dsfze(at)ezrfz(dot)com
> <mailto:dsfze(at)ezrfz(dot)com>'::text)) OR (u.id <http://u.id> IS NOT NULL))
> Rows Removed by Filter: 58247
> -> Seq Scan on contact_contact c (cost=0.00..2022.12
> rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
> -> Hash (cost=1.37..1.37 rows=37 width=8) (actual
> time=0.029..0.029 rows=37 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 2kB
> -> Seq Scan on user_user_info u (cost=0.00..1.37 rows=37
> width=8) (actual time=0.004..0.015 rows=37 loops=1)
> Planning time: 0.790 ms
> Execution time: 53.712 ms
>
> -- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead
> of userinfo1_.id)
> explain analyze select c.*
> from contact_contact c
> left outer join user_user_info u on c.user_info=u.id
> <http://u.id>
> left outer join contact_address a on c.address=a.id
> <http://a.id>
> where lower(c.name <http://c.name>)='martelli'
> and c.email='dsfze(at)ezrfz(dot)com <mailto:dsfze(at)ezrfz(dot)com>' or
> c.user_info is not null;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1
> width=4012) (actual time=0.037..0.037 rows=0 loops=1)
> Recheck Cond: (((email)::text = 'dsfze(at)ezrfz(dot)com
> <mailto:dsfze(at)ezrfz(dot)com>'::text) OR (user_info IS NOT NULL))
> Filter: (((lower((name)::text) = 'martelli'::text) AND
> ((email)::text = 'dsfze(at)ezrfz(dot)com
> <mailto:dsfze(at)ezrfz(dot)com>'::text)) OR (user_info IS NOT NULL))
> -> BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual
> time=0.034..0.034 rows=0 loops=1)
> -> Bitmap Index Scan on idx_contact_email
> (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0
> loops=1)
> Index Cond: ((email)::text = 'dsfze(at)ezrfz(dot)com
> <mailto:dsfze(at)ezrfz(dot)com>'::text)
> -> Bitmap Index Scan on contact_contact_user_info_idx
> (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0
> loops=1)
> Index Cond: (user_info IS NOT NULL)
> Planning time: 0.602 ms
> Execution time: 0.118 ms
>
>
> If you look closely at the 2nd query plan, you'll see that no joins
> are performed, and it's only the contract_contract table that's looked
> at. This is because PostgresSQL sees that none of the columns from the
> 2 tables which are being left joined to are used, and also that the
> columns that you're joining to on these tables are unique, therefore
> joining to them cannot duplicate any rows, and since these are left
> joined, if there was no matching row, then it wouldn't filter out rows
> from the contract_contract table, as it would with INNER JOINs. The
> planner sees that these left joins are pointless, so just removes them
> from the plan.
>
> Regards
>
> David Rowley

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-10-20 13:58:08 Re: IS NOT NULL and LEFT JOIN
Previous Message Greg Stark 2014-10-19 22:24:51 Re: Yet another abort-early plan disaster on 9.3