Re: Index ignored with "is not distinct from", 8.2 beta2

From: "JEAN-PIERRE PELLETIER" <pelletier_32(at)sympatico(dot)ca>
To: jim(at)nasby(dot)net
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index ignored with "is not distinct from", 8.2 beta2
Date: 2006-11-07 02:10:40
Message-ID: BAY118-F62F03D90186754B675B7295F20@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This shows all three forms to be equivalent.

SELECT
exp1,
exp2,
exp1 IS NOT DISTINCT FROM exp2 AS isnotdistinct,
exp1 is not null and exp2 is not null and exp1 = exp2 or exp1 is null and
exp2 is null AS JP,
(exp1 is not null and exp2 is not null and exp1 = exp2) or (exp1 is null
and exp2 is null) AS Jim
FROM
(SELECT 1 AS exp1, 1 AS exp2
UNION ALL SELECT 1, 2
UNION ALL SELECT 1,NULL
UNION ALL SELECT NULL,1
UNION ALL SELECT NULL,NULL) Q;

I understand that the planner doesn't use indexes for IS NOT DISTINCT FROM,
but it would
be good because "is not distinct from" is very useful when you have nulls
but don't want to use three value logic.

null = null => true
null = not null => false

I don't think it is that uncommon and even some SQL constructs such as
"select distinct" or "group by" compare null that way.

I'll wait before using IS NOT DISTINCT FROM and stick with the equivalent
longer forms which
use indexes.

8.2 is better than 8.1 which was not picking up indexes even with the
longer forms.

Jean-Pierre Pelletier

>From: "Jim C. Nasby" <jim(at)nasby(dot)net>
>To: JEAN-PIERRE PELLETIER <pelletier_32(at)sympatico(dot)ca>
>CC: pgsql-hackers(at)postgresql(dot)org
>Subject: Re: [HACKERS] Index ignored with "is not distinct from", 8.2 beta2
>Date: Mon, 6 Nov 2006 16:02:40 -0600
>
>One issue is that I'm not sure think you've got your sugar quite right.
>Have you tested with:
>
>(col IS NOT NULL AND 123 IS NOT NULL AND col = 123) OR
> (col IS NULL and 123 IS NULL) ?
>
>It's possible that the planner doesn't know about using an index for
>DISTINCT; or it might just want an index that's defined WHERE col IS NOT
>NULL.
>
>On Wed, Nov 01, 2006 at 03:29:33PM -0500, JEAN-PIERRE PELLETIER wrote:
> > I've reposted this from pgsql-performance where I got no response.
> >
> > ==========================================
> >
> > Hi,
> >
> > I wanted to use "exp1 is not distinct from exp2" which I tough was
>syntaxic
> > sugar for
> > exp1 is not null and exp2 is not null and exp1 = exp2 or exp1 is null
>and
> > exp2 is null
> > but my index is ignored with "is not distinct from".
> >
> > Is this the expected behavior ?
> >
> > create temporary table t as select * from generate_series(1,1000000)
>t(col);
> > create unique index i on t(col);
> > analyze t;
> >
> > -- These queries don't use the index
> > select count(*) from t where col is not distinct from 123;
> > select count(*) from t where not col is distinct from 123;
> >
> > -- This query use the index
> > select count(*) from t where col is not null and 123 is not null and col
>=
> > 123 or col is null and 123 is null;
> >
> > explain analyze select count(*) from t where col is not distinct from
>123;
> > QUERY PLAN
> >
>------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=19154.79..19154.80 rows=1 width=0) (actual
> > time=228.200..228.202 rows=1 loops=1)
> > -> Seq Scan on t (cost=0.00..17904.90 rows=499956 width=0) (actual
> > time=0.042..228.133 rows=1 loops=1)
> > Filter: (NOT (col IS DISTINCT FROM 123))
> > Total runtime: 228.290 ms
> > (4 rows)
> > Time: 219.000 ms
> >
> > explain analyze select count(*) from t where not col is distinct from
>123;
> > QUERY PLAN
> >
>------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=19154.79..19154.80 rows=1 width=0) (actual
> > time=235.950..235.952 rows=1 loops=1)
> > -> Seq Scan on t (cost=0.00..17904.90 rows=499956 width=0) (actual
> > time=0.040..235.909 rows=1 loops=1)
> > Filter: (NOT (col IS DISTINCT FROM 123))
> > Total runtime: 236.065 ms
> > (4 rows)
> > Time: 250.000 ms
> >
> > explain analyze select count(*) from t where col is not null and 123 is
>not
> > null and col = 123 or col is null and 123 is null;
> > QUERY PLAN
> >
>-----------------------------------------------------------------------------------------------------------
> > Aggregate (cost=8.13..8.14 rows=1 width=0) (actual time=0.267..0.268
> > rows=1 loops=1)
> > -> Index Scan using i on t (cost=0.00..8.13 rows=1 width=0) (actual
> > time=0.237..0.241 rows=1 loops=1)
> > Index Cond: (col = 123)
> > Total runtime: 0.366 ms
> > (4 rows)
> > Time: 0.000 ms
> >
> > I am on Windows XP Service pack 2 with PostgreSQL 8.2 beta2
> >
> > Thanks,
> > Jean-Pierre Pelletier
> > e-djuster
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
>
>--
>Jim Nasby jim(at)nasby(dot)net
>EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2006-11-07 04:02:31 Re: Introducing an advanced Frequent Update Optimization
Previous Message Mark Kirkwood 2006-11-07 02:00:53 Re: Introducing an advanced Frequent Update Optimization