Skip site navigation (1) Skip section navigation (2)

Re: Sequential Scan with LIMIT

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Meinel <john(at)johnmeinel(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential Scan with LIMIT
Date: 2004-10-28 23:54:30
Message-ID: 20041028235430.GJ55164@decibel.org (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, Oct 28, 2004 at 07:49:28PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel(at)decibel(dot)org> writes:
> > On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote:
> >> The test case you are showing is probably suffering from nonrandom
> >> placement of this particular data value; which is something that the
> >> statistics we keep are too crude to detect.
>  
> > Isn't that exactly what pg_stats.correlation is?
> 
> No.  A far-from-zero correlation gives you a clue that on average, *all*
> the data values are placed nonrandomly ... but it doesn't really tell
> you much one way or the other about a single data value.
 
Maybe I'm confused about what the original issue was then... it appeared
that you were suggesting PGSQL was doing a seq scan instead of an index
scan because it thought it would find it on the first page if the data
was randomly distributed. If the correlation is highly non-zero though,
shouldn't it 'play it safe' and assume that unless it's picking the min
or max value stored in statistics it will be better to do an index scan,
since the value it's looking for is probably in the middle of the table
somewhere? IE: if the values in the field are between 1 and 5 and the
table is clustered on that field then clearly an index scan would be
better to find a row with field=3 than a seq scan.
-- 
Jim C. Nasby, Database Consultant               decibel(at)decibel(dot)org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

pgsql-performance by date

Next:From: Patrick HatcherDate: 2004-10-29 14:04:53
Subject: determining max_fsm_pages
Previous:From: Tom LaneDate: 2004-10-28 23:49:28
Subject: Re: Sequential Scan with LIMIT

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group