Skip site navigation (1) Skip section navigation (2)

Re: index vs. seq scan choice?

From: "George Pavlov" <gpavlov(at)mynewplace(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,<pgsql-general(at)postgresql(dot)org>
Subject: Re: index vs. seq scan choice?
Date: 2007-06-07 21:56:06
Message-ID: 8C5B026B51B6854CBE88121DBF097A86DEA6B4@ehost010-33.exch010.intermedia.net (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-www
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us] 
> "George Pavlov" <gpavlov(at)mynewplace(dot)com> writes:
> > I am curious what could make the PA query to ignore the 
> index. What are
> > the specific stats that are being used to make this decision?
> 
> you don't have the column's statistics target set high enough to
> track all the interesting values --- or maybe just not high enough to
> acquire sufficiently accurate frequency estimates for them.  
> Take a look at the pg_stats row for the column ...
> 
> (The default statistics target is 10, which is widely considered too
> low --- you might find 100 more suitable.)

Well, it seems that it would be more beneficial for me to set it LOWER
than the default 10. I get better performance if the stats are less
accurate because then the optimizer seems more likely to choose the
index! States that are in pg_stats.most_common_vals most often result in
a Seq Scan, whereas ones that are not in it definitely get the Index
Scan. For all states, even the largest ones (15% of the data), the Index
Scan performs better. So, for example, with SET STATISTICS 10 my
benhcmark query in a state like Indiana (2981 rows, ~3% of total) runs
in 132ms. If I SET STATISTICS 100, Indiana gets on the most_common_vals
list for the column and the query does a Seq Scan and its run time jumps
to 977ms! If I go the other way and SET STATISTICS 1 (or 0) I can bring
down the list to one entry (setting to 0 seems equivalent and still
keeps the one most common entry!?) and I will get the Index scan for all
states except for that one most common state. But, of course, I don't
want to undermine the whole stats mechanism, I just want the system to
use the index that is so helpful and brings runtimes down by a factor of
4-8! What am I missing here?

George

In response to

Responses

pgsql-www by date

Next:From: Joshua D. DrakeDate: 2007-06-07 22:01:51
Subject: Re: index vs. seq scan choice?
Previous:From: Nikolay SamokhvalovDate: 2007-06-07 21:18:58
Subject: Re: [DOCS] Users comments don't migrate to docs for new version?

pgsql-general by date

Next:From: Michael GlaesemannDate: 2007-06-07 21:58:03
Subject: Re: subtract a day from the NOW function
Previous:From: Michael GlaesemannDate: 2007-06-07 21:53:03
Subject: Re: subtract a day from the NOW function

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group