varchar index joins not working?

From: "Adam Gundy" <adam(at)starsilk(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: varchar index joins not working?
Date: 2008-04-10 03:13:23
Message-ID: 6f55f1270804092013n73ee1f81t2c6d72cee49655a3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm hitting an unexpected problem with postgres 8.3 - I have some
tables which use varchar(32) for their unique IDs which I'm attempting
to join using some simple SQL:

select *
from group_access, groups
where group_access.groupid = groups.groupid and
group_access.uid = '7275359408f44591d0717e16890ce335';

there's a unique index on group_access.groupid, and a non-unique index
on groups.groupid. both are non-null.

the problem is: if groupid (in both tables) is varchar, I cannot force
postgres (no matter how hard I try) to do an index scan. it ends up
reading the entire groups table (pretty large!):

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=8.89..41329.88 rows=119940 width=287) (actual
time=0.202..935.136 rows=981 loops=1)
Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text)
-> Seq Scan on groups (cost=0.00..31696.48 rows=1123348
width=177) (actual time=0.011..446.091 rows=1125239 loops=1)
-> Hash (cost=8.51..8.51 rows=30 width=110) (actual
time=0.148..0.148 rows=30 loops=1)
-> Seq Scan on group_access (cost=0.00..8.51 rows=30
width=110) (actual time=0.014..0.126 rows=30 loops=1)
Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
Total runtime: 935.443 ms
(7 rows)

if I disable seq_scan, I get this:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.47..106189.61 rows=120004 width=287) (actual
time=0.100..1532.353 rows=981 loops=1)
Merge Cond: ((group_access.groupid)::text = (groups.groupid)::text)
-> Index Scan using group_access_pkey on group_access
(cost=0.00..43.91 rows=30 width=110) (actual time=0.044..0.148 rows=30
loops=1)
Index Cond: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
-> Index Scan using groups_1_idx on groups (cost=0.00..102135.71
rows=1123952 width=177) (actual time=0.031..856.555 rows=1125827
loops=1)
Total runtime: 1532.880 ms
(6 rows)

it's running an index scan across the entire table (no condition applied) :-(

so, just for the hell of it, I tried making groupid a char(32),
despite repeated assertions in this group that there's no performance
difference between the two:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.48..253.85 rows=304 width=291) (actual
time=0.715..22.906 rows=984 loops=1)
-> Bitmap Heap Scan on group_access (cost=4.48..9.86 rows=30
width=111) (actual time=0.372..0.570 rows=30 loops=1)
Recheck Cond: (uid = '7275359408f44591d0717e16890ce335'::bpchar)
-> Bitmap Index Scan on group_access_uid_key
(cost=0.00..4.48 rows=30 width=0) (actual time=0.331..0.331 rows=30
loops=1)
Index Cond: (uid = '7275359408f44591d0717e16890ce335'::bpchar)
-> Index Scan using groups_1_idx on groups (cost=0.00..7.96
rows=14 width=180) (actual time=0.176..0.396 rows=33 loops=30)
Index Cond: (groups.groupid = group_access.groupid)
Total runtime: 26.837 ms
(8 rows)

(this last plan is actually against a smaller test DB, but I get the
same behavior with it, seq scan for varchar or index scan for char,
and the results returned are identical for this query)

the databases are UTF-8, if that makes a difference...

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Arjen van der Meijden 2008-04-10 07:13:39 Re: large tables and simple "= constant" queries using indexes
Previous Message samantha mahindrakar 2008-04-10 02:43:17 Re: Performance with temporary table