JOIN between three *simple* tables ...

From: "Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: JOIN between three *simple* tables ...
Date: 2002-02-06 20:47:45
Message-ID: 20020206162412.F57607-100000@earth.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql


Morning all ...

First off, this is using v7.2 release ...

Okay, this is going to drive me up the proverbial wall ... very
simple query:

SELECT p.uid, p.handle
FROM orientation_c poc, profiles p, gender_f pgf
WHERE (p.uid = pgf.uid )
AND (pgf.uid = poc.uid ) ;

profiles contains:

iwantu=# select count(1) from profiles;
count
--------
485969
(1 row)

and is everyone in the system ... no problems there ...

gender_f contains:

iwantu=# select count(1) from gender_f;
count
-------
75664
(1 row)

And is *just* the uid's of those in profiles that are female ...

finally, orientation_c:

iwantu=# select count(1) from orientation_c;
count
--------
126477
(1 row)

Is again *just* the uid's of those in profiles that have a 'c'
orientiation ...

Now, the above wquery has an explain of:

Hash Join (cost=6363.90..47877.08 rows=19692 width=35)
-> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)
-> Hash (cost=6174.74..6174.74 rows=75664 width=16)
-> Hash Join (cost=2928.34..6174.74 rows=75664 width=16)
-> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8)
-> Hash (cost=1948.77..1948.77 rows=126477 width=8)
-> Seq Scan on orientation_c poc (cost=0.00..1948.77 rows=126477 width=8)

Now, a join between poc and pgf alone comes out to:

iwantu=# select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;
count
-------
12703
(1 row)

iwantu=# explain select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;
NOTICE: QUERY PLAN:

Aggregate (cost=6363.90..6363.90 rows=1 width=16)
-> Hash Join (cost=2928.34..6174.74 rows=75664 width=16)
-> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8)
-> Hash (cost=1948.77..1948.77 rows=126477 width=8)
-> Seq Scan on orientation_c poc (cost=0.00..1948.77 rows=126477 width=8)

EXPLAIN

Now, what I'd like to have happen is a SEQ SCAN through the smaller table
(gender_f), and grab everything in orientation_c that matches (both tables
have zero duplicates of uid, its purely a one of, so I would think that I
should be able to take 1 uid from pgf, and use the index on poc to
determine if it exists, and do that 75664 times ...

That would live me with 12703 UIDs to match up with apropriate records in
the almost 500+k records in profiles itself, instead of having to scan
through each of thoose 500+k records themselves ...

Then again, let's go one simpler:

iwantu=# \d orientation_c
Table "orientation_c"
Column | Type | Modifiers
--------+--------+-----------
uid | bigint |
Indexes: poc_uid

iwantu=# \d poc_uid
Index "poc_uid"
Column | Type
--------+--------
uid | bigint
btree

iwantu=# explain select count(1) from orientation_c poc where uid = 1;
NOTICE: QUERY PLAN:

Aggregate (cost=2264.97..2264.97 rows=1 width=0)
-> Seq Scan on orientation_c poc (cost=0.00..2264.96 rows=1 width=0)

EXPLAIN

if all varlues in orientation_c are unique, and there are 127k
records ... shouldn't it use the index instead of scanning through all 127k records ? Or am I missing something totally obvious here?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haroldo Stenger 2002-02-06 20:59:32 Re: Threaded PosgreSQL server
Previous Message Doug McNaught 2002-02-06 20:24:04 Re: Threaded PosgreSQL server

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2002-02-06 21:14:33 Re: JOIN between three *simple* tables ...
Previous Message Jeff Eckermann 2002-02-06 19:24:11 Re: Query with Parameters