Re: "SELECT .. WHERE NOT IN" query running for hours

From: Scott Carey <scott(at)richrelevance(dot)com>
To: ??????? ???????? <lebiathan(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: "SELECT .. WHERE NOT IN" query running for hours
Date: 2011-01-10 20:24:10
Message-ID: C950A6CB.18626%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 1/7/11 1:29 AM, "??????? ????????" <lebiathan(at)gmail(dot)com> wrote:

>
>So my follow-up question on the subject is this:
>
>Are there any particular semantics for the "NOT IN" statement that cause
>the correlated query to execute for every row of the outter query, as
>opposed to the "NOT EXISTS" ?

=> select * from explode_array(ARRAY[1,2,3,4,5]) where explode_array not
in (0, 1, 2);
explode_array
---------------
3
4
5

=> select * from explode_array(ARRAY[1,2,3,4,5]) where explode_array not
in (0, 1, 2, null);
explode_array
---------------
(0 rows)

The existence of a single NULL in the "not in" segment causes no results
to be returned. Postgres isn't smart enough to analyze whether the
contents of the NOT IN() can contain null and choose a more optimal plan,
and so it always scans ALL rows. Even if the NOT IN() is on a not null
primary key. NOT IN is generally dangerous because of this behavior --
it results from the fact that '1 = null' is null, and 'not null' is equal
to 'null':

=> select (1 = 1);
?column?
----------
t

select NOT (1 = 1);
?column?
----------
f

=> select (1 = null);
?column?
----------

(1 row)

=> select NOT (1 = null);
?column?
----------

(1 row)

NOT EXISTS doesn't have this problem, since NOT EXISTS essentially treats
the existence of null as false, where NOT IN treats the existence of null
as true.

rr=> select * from (select * from explode_array(ARRAY[1,2,3,4,5])) foo
where not exists (select 1 where explode_array in (0, 1, 2, null));
explode_array
---------------
3
4
5
(3 rows)

Often, the best query plans result from 'LEFT JOIN WHERE right side is
NULL' rather than NOT EXISTS however. I often get performance gains by
switching NOT EXISTS queries to LEFT JOIN form. Though sometimes it is
less performant.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-01-10 20:37:50 Re: "SELECT .. WHERE NOT IN" query running for hours
Previous Message Mladen Gogala 2011-01-10 17:28:34 Re: "SELECT .. WHERE NOT IN" query running for hours