Re: Null comparisons (was Re: checksum)

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Null comparisons (was Re: checksum)
Date: 2004-09-29 11:05:20
Message-ID: Pine.LNX.4.61.0409291000290.5367@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 28 Sep 2004, Greg Stark wrote:

>
> Marco Colombo <pgsql(at)esiway(dot)net> writes:
>
>> (a = b or (a is null and b is null))
>>
>> that raises a flag for me. It seems that NULL is used as a special value,
>> which is not.
>
> Well, as I said, it raised a flag for me too. However, it's not good to be too
> dogmatic about things. General rules are useful guiding principles but you
> have to recognize when it's worth it to break them. We don't know enough about
> his problem to say another approach would be any better.

The way SQL treats NULLs in boolean expressions is not a "general rule".
It's just the way it is. I'm being pragmatic not dogmatic.

> For example, in one application I have a table that *does* have "unknown"
> values. However I do need to look up records that match criteria including
> having "unknown" values in specific positions. For most queries using NULL is
> convenient and it's perfectly appropriate. But I have queries like this user
> does and I use coalesce since I find the resulting expression much clearer
> than using the three-way logical expression above.

Using special values is a mistake, but I agree that's being dogmatic.

Using NULL as a special value is a way worse mistake. Three way logic
it's there, there's nothing you can do about that. NULL can't be a value.
That's why you can't use it in comparisons.

As for the "dogma" part, I bet you're using special values only to
model states, not values, and are using the same columns you're using
to model values. Just add another column, life will be better.
This has nothing to do with NULLs, BTW.

> Incidentally, coalesce(a,0)=coalesce(b,0) has the advantage over all the other
> suggestions that you can build an index on coalesce(a,0) and/or coalesce(b,0)
> and use them for the join or for individual record lookups.
>
>> BTW,
>>
>> coalesce(a,0) = coalesce(b,0)
>>
>> is wrong, since it assumes 0 is a special value, never used in the
>> table. If so, it's better use it from the start instead of NULL for
>> those special rows.
>
> I specifically said you had to use a special value in my suggestion. Saying
> something is "wrong" when it does what's needed just because it violates some
> abstract design principle is just short-sighted.
>
> Using 0 in the table might violate unique constraints or foreign key
> constraints. I try to avoid having a single quirky table propagate its
> quirkiness to the rest of the system.

_Unique_ constraints? I don't get this. 0 has no special meaning.
1, -1, 1000000 might violate unique constraints as well. Any value might.
What's the point here? Same goes for foreign key constraints.
Actually same goes for _any_ constraint.

I agree that using a value you _know_ it's invalid due to some contraints
allows you to do the "coalesce trick" safely. But this assumes there is
at least _one_ invalid value. This is not true in general. In general,

coalesce(a,0) = coalesce(b,0)

is not the same of

(a = b) or (a is null and b is null).

I've even provided a fine example. It can't be a general equivalence.

> For example, creating a bogus "0" record in some other table just to satisfy
> the foreign key constraint then having the rest of the application have to
> work around this bogus record results in a much less workable system than
> simply using NULL instead of 0 for the special value.

I'm lost here. I've never proposed to add 0 anywhere. I just wonder:
"what if someone does?". In order to use 0 in coalesce(a,0) = coalesce(b,0)
you have to make sure 0 is invalid for both a and b. If you knew -1 is
invalid instead, you would use coalesce(a,-1) = coalesce(b,-1).
But what if there's no invalid value?

It might seem a "dogmatic" question, but my point is: why bother?
Just get the model right. Mapping NULLs to 0 or -1 or whatever is
meaningless, if the model is right.

That's again the whole point. What you're proposing sounds like this:
"I'd use 0 (or other special value) in the table, but that's not good
cause it may break some constraint. So I use NULLs in place of my
special value, and convert them at later time with coalesce(), so that
I can compare them again."

Now, that's abusing of NULLs. There's a reason why NULLs don't break
foreign key constraints, and a reason why you can't compare them.
Your use of coalesce(), your functional index, it's just placing a
brown paperbag on the real problem, which is a wrong model. Don't play
with NULLs, fix your schema. A badly designed model is not a matter
of "general principles". It's a plain real-world mistake. And leads
to any kind of acrobatic exercises in SQL to get the system work.
One day you'll run into another query you've got a hard time to write
or to make run efficently.

NULLs are not meant to be "values" and should never be used as such.
Anytime you feel the need of an index on them, or to compare them,
you're treating them as values, and that's plain wrong. Not because
of any dogma, but because that's how the system works. The _only_
way to use them as values is to abuse of them.

>
>> coalesce(a = b, a is null and b is null)
>>
>> is correct, and maybe slightly better than the original
>>
>> (a = b) or (a is null and b is null)
>>
>> if the implementation is smart enough to evaluate its arguments only
>> when needed. The or operator needs to evaluate the right side when
>> the left side is either false or null, COALESCE only when it's null.
>> I think the docs mention that.
>
> Actually it's only the latter expression that will be able to avoid evaluating
> the extra expression, not the coalesce example. In any case the time to
^^^^^^^^^^^^^^^^^^^^^^^^
Why not?

> evaluate the "a is null and b is null" part is negligible. And the fact that
> neither can use any indexes is the only relevant performance question. It's
> possible that's not a concern, but if it is they both lose.

We agree here. I wrote "maybe slightly better".

> That's one possible argument in favour of a === operator. It would be easy (I
> think?) to make === use a btree index without even having to build a
> functional index like with coalesce(a,0).

A new, non standard operator just to support badly modeled schemas?
No, thanks.

In C, you can store a (small) string into a pointer type variable,
if you're careful enough, and know what you're doing. It might work
for you, but it's still abusing of the language: you can't
expect the language to _support_ that. And for sure, you can't sell
it as a general solution.

I'm not against abusing of the db, nor playing dirty tricks, if that fits
your needs. You're free to design your db the way you like and face
the cost of a careful design or of later SQL gymnastics. I'm fine,
as long as you don't ask for syntactic sugar to support those "features".

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shridhar Daithankar 2004-09-29 11:21:30 Re: About PostgreSQL's limit on arithmetic operations
Previous Message bartkoedith 2004-09-29 11:03:20 Pgsql installer beta2 dev2 / dev3 freeze