On Thu, Oct 28, 2004 at 07:49:28PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel(at)decibel(dot)org> writes:
> > On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote:
> >> The test case you are showing is probably suffering from nonrandom
> >> placement of this particular data value; which is something that the
> >> statistics we keep are too crude to detect.
> > Isn't that exactly what pg_stats.correlation is?
> No. A far-from-zero correlation gives you a clue that on average, *all*
> the data values are placed nonrandomly ... but it doesn't really tell
> you much one way or the other about a single data value.
Maybe I'm confused about what the original issue was then... it appeared
that you were suggesting PGSQL was doing a seq scan instead of an index
scan because it thought it would find it on the first page if the data
was randomly distributed. If the correlation is highly non-zero though,
shouldn't it 'play it safe' and assume that unless it's picking the min
or max value stored in statistics it will be better to do an index scan,
since the value it's looking for is probably in the middle of the table
somewhere? IE: if the values in the field are between 1 and 5 and the
table is clustered on that field then clearly an index scan would be
better to find a row with field=3 than a seq scan.
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
In response to
pgsql-performance by date
|Next:||From: Patrick Hatcher||Date: 2004-10-29 14:04:53|
|Subject: determining max_fsm_pages|
|Previous:||From: Tom Lane||Date: 2004-10-28 23:49:28|
|Subject: Re: Sequential Scan with LIMIT |