Re: Query plan for NOT IN

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Guy Rouillier" <guyr-ml1(at)burntmail(dot)com>, Grzegorz Ja*kiewicz <gryzman(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query plan for NOT IN
Date: 2009-10-07 14:39:59
Message-ID: 4ACC61EF020000250002B6B7@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Grzegorz Jaœkiewicz<gryzman(at)gmail(dot)com> wrote:
> Guy Rouillier <guyr-ml1(at)burntmail(dot)com>wrote:
>> Grzegorz Jaœkiewicz wrote:

>>> using nulls as default 'idunno' - is a bad practice

>> I don't understand this point of view. The concept of null was
>> introduced into the SQL vernacular by Codd and Date expressly to
>> represent unknown values.

> if by default your account balance is 0, you should set it to 0, not
> leave it as null

If your business rules are that a new account is created with a zero
balance and then deposits are made, sure -- insert the account row
with a zero balance, *because you know it to be zero*. It's been rare
that I've seen anyone err on the side of using NULL in place of a
default for such cases. Much more common is using, for example, 'NMI'
in the middle name column to denote "No Middle Initial". Such "magic
values" can cause no end of trouble.

A failing of the SQL standard is that it uses the same mark (NULL) to
show the absence of a value because it is unknown as for the case
where it is known that no value exists (not applicable). Codd argued
for a distinction there, but it hasn't come to pass, at least in the
standard. If anyone could suggest a way to support standard syntax
and semantics and add extensions to support this distinction, it might
be another advance that would distinguish PostgreSQL from "less
evolved" products. :-)

None of that changes the requirement that NOT IN must result in
UNKNOWN if any of the values involved are NULL. You can't say that my
birthday is not in the set of birthdays for other subscribers to this
list without knowing the birthdays of all subscribers. This
definition of the operator makes it hard to optimize, but setting
unknown birthdays to some date far in the past or future, to avoid
using NULL, would just result in bogus results for this query as well
as, for example, queries attempting to calculate aggregates on age.

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guy Rouillier 2009-10-07 17:33:16 Re: Query plan for NOT IN
Previous Message Grzegorz Jaśkiewicz 2009-10-07 13:27:03 Re: Query plan for NOT IN