Re: varchar index joins not working?

From: Richard Huxton <dev(at)archonet(dot)com>
To: Adam Gundy <adam(at)starsilk(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: varchar index joins not working?
Date: 2008-04-14 07:14:01
Message-ID: 48030439.9040309@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Adam Gundy wrote:
> On Thu, Apr 10, 2008 at 8:52 AM, Adam Gundy <adam(at)starsilk(dot)net> wrote:
>> Richard Huxton wrote:
>>> How many distinct values do you have in groups.groupid and
>> group_access.group_id?
>> for the small database (since it shows the same problem):
>>
>> group_access: 280/268
>> groups: 2006/139
>>
>> for the large database:
>>
>> group_access: same
>> groups: 1712647/140
>>
>> the groupid key is an MD5 hash, so it should be uniformly distributed.
>> maybe that throws the stats? but, again, char works, varchar doesn't :-(
>
> OK, I'm thinking the varchar/char part is not the issue.

Good, because it's not :-)

> the database is very unbalanced, most of the groups are 1000 or less
> records, with one group occupying 95% of the records.

I was wondering - that's why I asked for the stats.

> I *think* that when I analyze using char instead of varchar, it is
> recording a stat for the large group, but for some reason with varchar
> doesn't add a stat for that one.
>
> so, the real question is, how do I fix this? I can turn the stats way
> up to 1000, but that doesn't guarantee that I'll get a stat for the
> large group :-(

Well, by default it will be tracking the 10 most common values (and how
often they occur). As you say, this can be increased to 1000 (although
it obviously takes longer to check 1000 rather than 10).

We can have a look at the stats with something like:
SELECT * FROM pg_stats WHERE tablename='group_access' AND attname='uid';
You'll be interested in n_distinct, most_common_vals and most_common_freqs.

However, I think the problem may be that PG doesn't track cross-column
stats, so it doesn't know that a particular uid implies one or more
particular groupid values.

> can I turn the statistics off completely for this column? I'm guessing
> that if I can, that will mean it takes a guess based on the number of
> distinct values in the groups table, which is still large number of
> records, possibly enough to trigger the seqscan anyway.

No - can't disable stats. Besides, you want it the other way around -
index scans for all groups except the largest.

> does postgres have a way of building a 'counted index' that the
> planner can use for it's record counts? some way of forcibly
> maintaining a stat for every group?

No, but let's see what's in pg_stats.

> the groups are not related to one another - is it possible to
> partition them into their own indexes somehow?

Yes, but it will depend on having an explicit group_id=... clause in the
query as well as on the index. That's not going to help you here.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2008-04-14 08:37:54 Re: db size
Previous Message Vinubalaji Gopal 2008-04-14 06:55:14 Re: db size