Re: problem (bug?) with "in (subquery)"

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Luca Pireddu <luca(at)cs(dot)ualberta(dot)ca>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: problem (bug?) with "in (subquery)"
Date: 2005-07-15 13:34:04
Message-ID: 20050715133404.GA30210@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Thu, Jul 14, 2005 at 01:34:21AM -0600, Luca Pireddu wrote:
> I have the following query that isn't behaving like I would expect:
>
> select * from strains s where s.id in (select strain_id from pathway_strains);

Any reason the subquery isn't doing "SELECT DISTINCT strain_id"?

> I would expect each strain record to appear only once. Instead I get output
> like this, where the same strain id appears many times:
>
> id | name | organism
> -------+--------------+----------
> 83 | common | 82
> 83 | common | 82
> 83 | common | 82

What happens when you try each of the following? Do they give the
expected results? I did some tests and I'm wondering if the planner's
hash join is responsible for the duplicate rows.

SELECT * FROM strains WHERE id IN (
SELECT strain_id FROM pathway_strains ORDER BY strain_id
);

CREATE TEMPORARY TABLE foo AS SELECT strain_id FROM pathway_strains;
SELECT * FROM strains WHERE id IN (SELECT strain_id FROM foo);

SET enable_hashjoin TO off;
SELECT * FROM strains WHERE id IN (SELECT strain_id FROM pathway_strains);

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2005-07-15 13:59:27 Re: problem (bug?) with "in (subquery)"
Previous Message Neil Dugan 2005-07-15 12:16:08 Re: How to obtain the list of data table name only