Re: Sequential Scan with LIMIT

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Meinel <john(at)johnmeinel(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential Scan with LIMIT
Date: 2004-10-28 23:54:30
Message-ID: 20041028235430.GJ55164@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Oct 28, 2004 at 07:49:28PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel(at)decibel(dot)org> writes:
> > On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote:
> >> The test case you are showing is probably suffering from nonrandom
> >> placement of this particular data value; which is something that the
> >> statistics we keep are too crude to detect.
>
> > Isn't that exactly what pg_stats.correlation is?
>
> No. A far-from-zero correlation gives you a clue that on average, *all*
> the data values are placed nonrandomly ... but it doesn't really tell
> you much one way or the other about a single data value.

Maybe I'm confused about what the original issue was then... it appeared
that you were suggesting PGSQL was doing a seq scan instead of an index
scan because it thought it would find it on the first page if the data
was randomly distributed. If the correlation is highly non-zero though,
shouldn't it 'play it safe' and assume that unless it's picking the min
or max value stored in statistics it will be better to do an index scan,
since the value it's looking for is probably in the middle of the table
somewhere? IE: if the values in the field are between 1 and 5 and the
table is clustered on that field then clearly an index scan would be
better to find a row with field=3 than a seq scan.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Patrick Hatcher 2004-10-29 14:04:53 determining max_fsm_pages
Previous Message Tom Lane 2004-10-28 23:49:28 Re: Sequential Scan with LIMIT