Re: subselect syntax

From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: Steve Lefevre <lefevre(dot)10(at)osu(dot)edu>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: subselect syntax
Date: 2007-06-06 23:26:59
Message-ID: 1821C578-8D90-4837-9894-2F1E0264F7EE@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Jun 6, 2007, at 16:39 , Steve Lefevre wrote:

> I have a table of projects, users, and user_projects. The table
> user_projects creates a many-to-many relationship between users and
> projects.
>
> I'm creating a select list on a web form for adding a new user to a
> project. I want to select users from the user table, who aren't
> already listed in the join table under that project.

What your looking for is sometimes called a semijoin: getting
everything that *isn't* associated with something. SQL doesn't have a
semijoin operator, but there are a couple of workarounds.

First, here's the setup I'm using:

CREATE TABLE users (user_id TEXT PRIMARY KEY);

INSERT INTO users (user_id)
VALUES ('fred')
, ('wilma')
, ('barney')
, ('betty');

CREATE TABLE projects ( project_id TEXT PRIMARY KEY);

INSERT INTO projects (project_id)
VALUES ('foo')
, ('bar')
, ('baz');

CREATE TABLE user_projects
(
user_id TEXT NOT NULL REFERENCES users
, project_id TEXT NOT NULL REFERENCES projects
, PRIMARY KEY (user_id, project_id)
);

INSERT INTO user_projects (user_id, project_id)
VALUES ('fred', 'foo')
, ('wilma', 'foo')
, ('barney', 'bar')
, ('betty', 'bar')
, ('wilma', 'baz')
, ('betty', 'baz');

> Here's my select:
> SELECT * FROM users
> WHERE user_id <> $current_user_id
> AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
> $project_id )
>
> This query returns no rows, even on projects that have no records in
> the user_projects table!

Oh, really? I get an error when I try to run your example:

SELECT *
FROM users
WHERE user_id <> 'barney'
AND user_id <> (
SELECT user_id
FROM user_projects
WHERE project_id = 'baz');
ERROR: more than one row returned by a subquery used as an expression

Note that this will work (at least in 8.2.4) if the subquery only
returns a single row, but you'd run into problems as soon as you have
more than one user affiliated with a given project.

One of the ways to get a semijoin is using EXCEPT. We want to get all
of the users *except* those affiliated with a certain project (in
this case 'baz'):

-- get all users
SELECT user_id
FROM users
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz';
user_id
---------
barney
fred
(2 rows)

Let's say barney is the current user, so we add a restriction:

SELECT user_id
FROM (
-- get all users
SELECT user_id
FROM users
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz'
) non_baz_users
WHERE user_id <> 'barney';
user_id
---------
fred
(1 row)

We can also reformulate that by moving the restriction up into the
first query:
-- get all users other than barney
SELECT user_id
FROM users
WHERE user_id <> 'barney'
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz';
user_id
---------
fred
(1 row)

Another way to do it is to use a LEFT JOIN, taking advantage of the
fact that columns will be filled with NULL where there is no
association. This is often faster than using EXCEPT.

SELECT user_id, project_id
FROM users
LEFT JOIN (
-- users for project baz
SELECT *
FROM user_projects
WHERE project_id = 'baz') u_p USING (user_id);
user_id | project_id
---------+------------
fred |
wilma | baz
barney |
betty | baz
(4 rows)

Note that all users are returned. Now we just restrict the result to
those whose project_id is null (because they're not associated with
project baz) and those who aren't barney:

SELECT user_id
FROM users
LEFT JOIN (
-- users for project baz
SELECT *
FROM user_projects
WHERE project_id = 'baz') u_p USING (user_id)
WHERE project_id IS NULL -- all users who aren't associated with
project baz
-- so their project_id column is NULL
AND user_id <> 'barney'; -- and we don't want barney
user_id
---------
fred
(1 row)

A third way would have been to rewrite your query using NOT IN rather
than <>:

SELECT *
FROM users
WHERE user_id <> 'barney'
AND user_id NOT IN (
SELECT user_id
FROM user_projects
WHERE project_id = 'baz');
user_id
---------
fred
(1 row)

Hope that helps.

Michael Glaesemann
grzm seespotcode net

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Loredana Curugiu 2007-06-07 14:08:49 Query with tables from 2 different databases in Java
Previous Message Tom Lane 2007-06-06 23:19:43 Re: subselect syntax