Skip site navigation (1) Skip section navigation (2)

Re: varchar index joins not working?

From: Richard Huxton <dev(at)archonet(dot)com>
To: Adam Gundy <adam(at)starsilk(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: varchar index joins not working?
Date: 2008-04-10 08:46:09
Message-ID: 47FDD3D1.9070208@archonet.com (view raw or flat)
Thread:
Lists: pgsql-performance
Adam Gundy wrote:
> I'm hitting an unexpected problem with postgres 8.3 - I have some
> tables which use varchar(32) for their unique IDs which I'm attempting
> to join using some simple SQL:
> 
> select *
> from group_access, groups
> where group_access.groupid = groups.groupid and
>          group_access.uid = '7275359408f44591d0717e16890ce335';
> 
> there's a unique index on group_access.groupid, and a non-unique index
> on groups.groupid. both are non-null.

What about group_access.uid - I'd have thought that + groups pkey is 
probably the sensible combination here.

> the problem is: if groupid (in both tables) is varchar, I cannot force
> postgres (no matter how hard I try) to do an index scan. it ends up
> reading the entire groups table (pretty large!):

OK

>                                                         QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=8.89..41329.88 rows=119940 width=287) (actual
> time=0.202..935.136 rows=981 loops=1)

That's because it's expecting 119,940 rows to match (rather than the 
actual 981 you do get). If you were getting that many results this is 
probably a sensible plan.

>    Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text)
>    ->  Seq Scan on groups  (cost=0.00..31696.48 rows=1123348
> width=177) (actual time=0.011..446.091 rows=1125239 loops=1)

It's got a good idea of the total number of rows in groups.

>    ->  Hash  (cost=8.51..8.51 rows=30 width=110) (actual
> time=0.148..0.148 rows=30 loops=1)
>          ->  Seq Scan on group_access  (cost=0.00..8.51 rows=30
> width=110) (actual time=0.014..0.126 rows=30 loops=1)

And also group_access. Oh, the seq-scan doesn't really matter here. It 
probably *is* faster to read all 30 rows in one burst rather than go to 
the index and then back to the table.

>                Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
>  Total runtime: 935.443 ms
> (7 rows)
> 
> if I disable seq_scan, I get this:
> 
>                                                                    QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=1.47..106189.61 rows=120004 width=287) (actual
> time=0.100..1532.353 rows=981 loops=1)

It's still thinking it's going to get 120 thousand rows.

> it's running an index scan across the entire table (no condition applied) :-(
> 
> so, just for the hell of it, I tried making groupid a char(32),
> despite repeated assertions in this group that there's no performance
> difference between the two:

There's no performance difference between the two.

>  Nested Loop  (cost=4.48..253.85 rows=304 width=291) (actual
> time=0.715..22.906 rows=984 loops=1)

> (this last plan is actually against a smaller test DB, but I get the
> same behavior with it, seq scan for varchar or index scan for char,
> and the results returned are identical for this query)

The char(32) thing isn't important here, what is important is that it's 
expecting ~300 rows rather than 120,000. It's still wrong, but it's 
close enough to make sense.

So - the question is - why is PG expecting so many matches to your join. 
How many distinct values do you have in groups.groupid and 
group_access.group_id?

-- 
   Richard Huxton
   Archonet Ltd

In response to

Responses

pgsql-performance by date

Next:From: MatthewDate: 2008-04-10 09:51:13
Subject: Re: large tables and simple "= constant" queries using indexes
Previous:From: PFCDate: 2008-04-10 08:25:48
Subject: Re: large tables and simple "= constant" queries using indexes

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group