Re: IS NOT DISTINCT FROM + Indexing

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: "Jonathan S(dot) Katz" <jonathan(dot)katz(at)excoventures(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: IS NOT DISTINCT FROM + Indexing
Date: 2014-07-21 23:51:32
Message-ID: CAM3SWZTSuzm4V5hJ-P9mGdTk5SS0XjEjCgTJMCP42tBuZhR3gQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 21, 2014 at 4:16 PM, Jonathan S. Katz
<jonathan(dot)katz(at)excoventures(dot)com> wrote:
> With NULLs being indexable, I was wondering if there was some reason why IS NOT DISTINCT FROM could not use the index?

FWIW this works:

postgres=# explain analyze select * from orders where orderid in (5, null);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Index Scan using orders_pkey on orders (cost=0.29..12.60 rows=1
width=60) (actual time=0.019..0.021 rows=1 loops=1)
Index Cond: (orderid = ANY ('{5,NULL}'::integer[]))
Planning time: 0.100 ms
Execution time: 0.416 ms
(4 rows)

I think that it would almost be a Simple Matter of Programming to make
IS NOT DISTINCT FROM indexable. Under the hood, IS DISTINCT FROM isn't
very different to using the equality operator:

/*
* DistinctExpr - expression node for "x IS DISTINCT FROM y"
*
* Except for the nodetag, this is represented identically to an OpExpr
* referencing the "=" operator for x and y.
* We use "=", not the more obvious "<>", because more datatypes have "="
* than "<>". This means the executor must invert the operator result.
* Note that the operator function won't be called at all if either input
* is NULL, since then the result can be determined directly.
*/
typedef OpExpr DistinctExpr;

We're already inverting the equals operator. But that isn't
necessarily how a B-Tree index represents equality (that is, a
particular B-Tree operator class could have a non-'=' operator that it
thinks of as equality-ish - in general that could even be the default
B-Tree opclass and there may not be an equals operator). The fact that
most types think of the '=' equals operator as equality is just a
convention, and so technically IS DISTINCT FROM doesn't invert B-Tree
operation 3. See "31.14. Interfacing Extensions To Indexes" for
details. The equals operator '=' isn't really supposed to be magic, it
just is in some places.

Right now the executor is directly inverting the equality operator to
make this work (and has done so since long before NULLs were
indexable). This is a bit of a kludge. I guess it just works that way
because there is no convenient place to insert the special inversion
of the operator, and the special NULL handling that currently appears
within ExecEvalDistinct().

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-07-21 23:57:15 Re: IS NOT DISTINCT FROM + Indexing
Previous Message Jonathan S. Katz 2014-07-21 23:16:33 IS NOT DISTINCT FROM + Indexing