Re: Null comparisons (was Re: checksum)

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: David Helgason <david(at)uti(dot)is>
Cc: "Dean Gibson (DB Administrator)" <postgresql3(at)ultimeth(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Null comparisons (was Re: checksum)
Date: 2004-09-28 15:27:03
Message-ID: Pine.LNX.4.61.0409281627180.12692@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 27 Sep 2004, David Helgason wrote:

> On 27. sep 2004, at 22:08, Dean Gibson (DB Administrator) wrote:
>> Greg Stark wrote on 2004-09-27 08:17:
>>> Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
>>> >> On Sun, 26 Sep 2004 20:16:52 +0200, David Helgason <david(at)uti(dot)is>
>>> wrote:
>>> >>> On a similar note, I've found myself wanting an extended '=' operator
>>> >>> meaning
>>> >>> (a = b or (a is null and b is null))
>>> >
>>> > The original does appear to be equivalent to "not(a is distinct from
>>> b)",
>>> > although I'm not sure that's necessarily easier to use than the above.
>>>
>>> I often do things like "coalesce(a,0) = coalesce(b,0)".
>>> (Or whatever value you know won't appear)
>>>
>> Even simpler: COALESCE( a = b, a IS NULL AND b IS NULL )
>
> I'm not quite sure what is being accomplished here... My original expression
> wasn't that bad, just clunky. I'd prefer a === b or (a samevalue b), but the
> above just complicates matters. Also, a 'set' command outside the expression
> goes completely against the idea, that certain fields have 'null' as a legal,
> comparable value, while others do not.
>
> Anyway, idle speculation :)
>
> d.

(a = b or (a is null and b is null))

that raises a flag for me. It seems that NULL is used as a special value,
which is not. NULL just means 'unknown', nothing more, nothing less.
That's why any boolean expression involving a NULL is NULL.

'unknown' when compared to anything else just gives 'unknown'.
It means: "I can't tell whether the two expressions are the same, because
I don't know the value of one of them."
Note that that's different from "I know they're different".
They _could_ be equal, we just don't know.

'unknown' compared to 'unknown' gives of course 'unknown', since it's
just a special case of the above. Not knowing the value of both the
expressions doesn't help much. The answer can never be 'I know they
are the same.' Not knowing one is enough to say you don't know the
result of the comparison.

I can hardly imagine why you may want to select all rows that you
are certain have equal values, plus others that have potentially
different values because they are both unknown. It smells like a
design problem.

BTW,

coalesce(a,0) = coalesce(b,0)

is wrong, since it assumes 0 is a special value, never used in the
table. If so, it's better use it from the start instead of NULL for
those special rows. That espression is true for the following rows:
a | b
---+---
1 | 1
2 | 2
|
0 |
| 0

the last two rows are wrongly selected.

coalesce(a = b, a is null and b is null)

is correct, and maybe slightly better than the original

(a = b) or (a is null and b is null)

if the implementation is smart enough to evaluate its arguments only
when needed. The or operator needs to evaluate the right side when
the left side is either false or null, COALESCE only when it's null.
I think the docs mention that.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Barry S 2004-09-28 15:29:08 Re: Hardware recommendations
Previous Message Gaetano Mendola 2004-09-28 15:20:00 Re: Getting an out of memory failure.... (long email)