Re: SourceForge & Postgres

From: mlw <markw(at)mohawksoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SourceForge & Postgres
Date: 2000-12-13 00:20:03
Message-ID: 3A36C0B3.ADAF69FE@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> mlw <markw(at)mohawksoft(dot)com> writes:
> > btw anyone trying this query should use: "attdispersion"
>
> Sorry about that --- I just copied-and-pasted the query from some notes
> that are obsolete as of 7.1...
>
> > cdinfo=# explain select * from ztitles where artistid = 100000220 ;
> > NOTICE: QUERY PLAN:
>
> > Seq Scan on ztitles (cost=0.00..4740.75 rows=3163 width=296)
>
> > And this is with "-o -fs"
>
> > Index Scan using ztitles_artistid_ndx on ztitles (cost=0.00..5915.01
> > rows=3163 width=296)
>
> > attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival
> > artistid | 0.0477198 | 19274 | 2 | 97 | 0 | 0.149362 | 100050450 | 100000000 | 100055325
>
> The reason why the thing is going for a sequential scan is that
> astonishingly high stacommonfrac statistic. Does artistid 100050450
> really account for 14.9% of all the rows in your table? (Who is that
> anyway? ;-)) If so, a search for artistid 100050450 definitely *should*
> use a sequential scan.

I tested this statement against the database and you are right, about 14
seconds with the index, 4 without.

BTW ID # 100050450 is "Various Artists"

This is sort of a point I was trying to make in previous emails. I think
this situation, and this sort of ratio is far more likely than the
attention it has been given.

In about every project I have used postgres I have run into this. It is
only recently that I have understood what the problem was and how to get
around it (sort of).

This one entry is destroying any intelligent performance we could hope
to attain. As I said, I always see this sort of behavior in some
implementation.

> The problem at hand is estimating the frequency
> of entries for some other artistid, given that we only have this much
> statistical info available. Obviously the stats are insufficient, and
> I hope to do something about that in a release or two, but it ain't
> gonna happen for 7.1. In the meantime, if you've got huge outliers
> like that, you could try reducing the value of NOT_MOST_COMMON_RATIO
> in src/backend/utils/adt/selfuncs.c.

I did some playing with this value, and I can seem to have it
differentiate between 100050450 and anything else.

--
http://www.mohawksoft.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2000-12-13 00:41:36 Re: external function proposal for 7.2
Previous Message Nathan Myers 2000-12-12 23:51:43 Re: (one more time) Patches with vacuum fixes available .