Re: Problems with adding a is not null to a query.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Problems with adding a is not null to a query.
Date: 2011-01-17 23:55:49
Message-ID: 24112.1295308549@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tim Uckun <timuckun(at)gmail(dot)com> writes:
>> Am I right in guessing that pg_stats.n_distinct is much too low for
>> the domain_id column?

> the domain_id is in the topical urls. A select count of domains shows
> that there are 700 domains, the pg_stats shows 170 which seems kind of
> low but maybe is not out of bounds because most domains are probably
> inactive.

It sounds like you've got one of those long-tail distributions where
there are lots and lots of the first couple hundred domains, and not
many at all of the last few. The problem with the stats as you've got
them is that the planner isn't aware of the long tail, so for this
specific domain id that's not even there at all, you're getting an
estimate of a couple of thousand matches --- which is why it goes for
the indexscan-according-to-ORDER-BY plan. It's figuring it will hit one
of those matches and be able to end the scan after reading much less
than all of the table.

As I've stated repeatedly, your next move needs to be to increase the
stats target, at least for that column if not globally. You probably
don't need to have it know about every last domain id, but you need to
have it know about enough that it realizes that domains not included in
the MCV list are going to appear less than a couple of thousand times.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tim Uckun 2011-01-18 00:19:17 Re: Problems with adding a is not null to a query.
Previous Message Tim Uckun 2011-01-17 22:14:06 Re: Problems with adding a is not null to a query.