Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'

From: "Ian Sillitoe" <ian(dot)sillitoe(at)googlemail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'
Date: 2008-04-02 18:43:27
Message-ID: c6ff42340804021143p7e688b6ch8c0f4f98feaec0ea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I completely take your points - so maybe I should be asking for advice on
database design instead.

We are annotating nodes on a hierarchical structure where NULL implied an
absence of a value in this depth of the hierarchy. As a method of enforcing
this view, we use constraints to make sure we only get one combination of
rows for a given node of the hierarchy (whether the columns are integers or
NULLs). I'm not suggesting this was 'correct', I'm just saying this was what
we were thinking.

We considered using a flag of some description instead, however this also
seemed a bit messy (although in hindsight probably far less messy than the
job I now face trying to recode this table).

psql> select * from tree;
id | depth1 | depth2 | depth3 | name
----+--------+--------+--------+--------------------------
1 | 1 | | | Name for node_id "1"
1 | 1 | 1 | | Name for node_id "1.1"
2 | 1 | 2 | | Name for node_id "1.2"
3 | 1 | 2 | 3 | Name for node_id "1.2.3"
(3 rows)

psql> select * from leaf_node;
id | depth1 | depth2 | depth3 | depth4 | depth5
--------+--------+--------+--------+--------+--------
leaf_1 | 1 | 2 | 3 | 1 | 1
leaf_2 | 1 | 2 | 3 | 1 | 2
leaf_3 | 1 | 2 | 3 | 1 | 3
leaf_4 | 1 | 2 | 3 | 2 | 1
leaf_5 | 1 | 2 | 3 | 2 | 2
(5 rows)

So, is there a generally accept method of modelling nodes in hierarchical
structures (i.e. the 'tree' table in the example above) that doesn't use
NULLs? We started off using ids such as '1.1.2' as the PRIMARY KEY but
concatenating strings to go between columns and ids ended up being too
painful.

Would it be sensible to use an ARRAY of depths as the primary key? I guess
this would mean it would be possible to join

psql> select * from tree;
id | depth1 | depth2 | depth3 | name
------+--------+--------+--------+--------------------------
[1] | 1 | | | Name for node_id "1"
[1,2] | 1 | 1 | | Name for node_id "1.1"

Advice most welcome.

Cheers,

Ian

On Wed, Apr 2, 2008 at 6:48 PM, Ted Byers <r(dot)ted(dot)byers(at)rogers(dot)com> wrote:

> --- Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> > 'transform_null_equals' won't help you at all here
> > since it only help
> > in the very specific case of comparing with a
> > constant. The easiest is
> > to think of NULL as meaning 'unknown'. Clearly you
> > can't compare that
> > usefully with anything.
> >
> Not even a null in another record ... (hence my
> question below). If the value is unknown, then it
> could be anything, and (thinking as a mathematician
> considering real numbers) the probability of two
> records having null having their true, but unknown
> values be the same is indistinguishable from 0. (with
> integers or decimal numbers or floating point numbers,
> that would be qualified with the clause, for practical
> purposes :)
>
> > Perhaps you can use a marker like -1 to achieve the
> > effect you want?
> >
> Is that really valid, though, especially in a join? I
> mean, if the column in question has multiple nulls, in
> each of the tables, then how do you, of the DB, figure
> out which of the rows containing nulls in the one
> table match up with rows in the other table containing
> nulls? Or is the resultset supposed to be the product
> of the two sets (match each row with a null in the one
> table with each row with a null in the other)? That,
> for me, creates a nightmare situation where some of my
> tables have tens of millions of rows, and if even 1%
> of the rows contains null in the relevant column, I
> don't even want to think about processing the
> resultset that would be produced from such an idea
> using these tables.
>
> My joins always only involve primary keys, or indeces
> on columns that prohibit nulls, so this problem
> doesn't crop up in my code, but would I be not too far
> from right in expecting that the rational thing to do
> when creating a join on columns that allow nulls is to
> exclude ALL rows, in either table, where the columns
> involved are null?
>
> Cheers,
>
> Ted
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-04-02 18:53:20 Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'
Previous Message Brent Wood 2008-04-02 18:43:05 Re: (FAQ?) JOIN condition - 'WHERE NULL = NULL'