How to find missing values across multiple OUTER JOINs

From: Drew <drewmwilson(at)fastmail(dot)fm>
To: pgsql-sql(at)postgresql(dot)org
Subject: How to find missing values across multiple OUTER JOINs
Date: 2007-06-06 06:55:28
Message-ID: 4FD9368D-D4CA-44CF-887B-EBB64DCE46BA@fastmail.fm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I'm having troubles using multiple OUTER JOINs, which I think I want
to use to solve my problem.

My problem is to find all non-translated string values in our
translations database, given the 4 following tables:

SOURCE (source_id PRIMARY KEY, language_id, value, ...)
TRANSLATION (translation_id PRIMARY KEY, language_id, value, ...)
TRANSLATION_PAIR (source_id, translation_id)
LANGUAGE(language_id PRIMARY KEY, name)

This seems to me the appropriate situation for using OUTER JOINs, but
I cannot figure out how to get the null rows without the not-null rows.

Here's my best guess at this query:
SELECT
s.source_id,tp.translation_pair_id,t.translation_id,t.language_id,
l.name

FROM source s
LEFT OUTER JOIN translation_pair tp USING(source_id)
LEFT OUTER JOIN translation t ON tp.translation_id = t.translation_id
AND t.translation_id is null)
RIGHT OUTER JOIN language l on l.language_id = t.language_id;

To test this query, I have a string that only has a translation in
English and used it in this test query.
SELECT
s.source_id,tp.translation_pair_id,t.translation_id,t.language_id,
l.name
FROM (select * FROM source s WHERE source_id = 159986) as s
LEFT OUTER JOIN translation_pair tp USING(source_id)
LEFT OUTER JOIN translation t ON tp.translation_id =
t.translation_id
RIGHT OUTER JOIN language l on l.language_id = t.language_id;

This yields promising results:
source_id | translation_pair_id | translation_id | language_id
| name
-----------+---------------------+----------------+-------------
+----------------------
| | | |
Russian
| | | |
Danish
| | | | Dutch
159986 | 1893187 | 1743833 | 4 |
English
| | | |
Finnish
| | | |
French
| | | |
German
| | | |
Italian
| | | |
Japanese
| | | |
Korean
| | | |
Norwegian
| | | |
Simplified Chinese
| | | |
Spanish
| | | |
Swedish
| | | |
Traditional Chinese
| | | |
Portuguese
| | | |
Polish
| | | |
Turkish
| | | | Czech
| | | |
Brazilian Portuguese
(20 rows)

However, when I try to exclude the one not-null row, doing this:
SELECT
s.source_id,tp.translation_pair_id,t.translation_id,t.language_id,
l.name
FROM (select * FROM source s WHERE source_id = 159986) AS s
LEFT OUTER JOIN translation_pair tp USING(source_id)
LEFT OUTER JOIN translation t ON tp.translation_id =
t.translation_id AND t.translation_id IS NULL
RIGHT OUTER JOIN language l on l.language_id = t.language_id;

I expect 19 rows, but instead get 20 rows, all containing null values.
source_id | translation_pair_id | translation_id | language_id
| name
-----------+---------------------+----------------+-------------
+----------------------
| | | |
Russian
| | | |
Danish
| | | | Dutch
| | | |
English
| | | |
Finnish
| | | |
French
| | | |
German
| | | |
Italian
| | | |
Japanese
| | | |
Korean
| | | |
Norwegian
| | | |
Simplified Chinese
| | | |
Spanish
| | | |
Swedish
| | | |
Traditional Chinese
| | | |
Portuguese
| | | |
Polish
| | | |
Turkish
| | | | Czech
| | | |
Brazilian Portuguese
(20 rows)

I'm guessing I need to group the joins together, to avoid some
associative problem.

Do you see what I'm doing wrong?

Thanks for the help,

Drew

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Loredana Curugiu 2007-06-06 08:26:30 Re: [SQL] JOIN
Previous Message Joe Conway 2007-06-06 03:34:39 Re: Encrypted column