Re: Sequential Scan with LIMIT

From: John Meinel <john(at)johnmeinel(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential Scan with LIMIT
Date: 2004-10-24 21:51:24
Message-ID: 417C23DC.1090407@johnmeinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> John Meinel <john(at)johnmeinel(dot)com> writes:
>
>>I was looking into another problem, and I found something that surprised
>>me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
>>Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
>>maybe 100,000 times. Without the LIMIT, this query should definitely do
>>a sequential scan.
>
>
>>But with the LIMIT, doesn't it know that it will return at max 1 value,
>>and thus be able to use the index?
>
>
> But the LIMIT will cut the cost of the seqscan case too. Given the
> numbers you posit above, about one row in five will have 'myval', so a
> seqscan can reasonably expect to hit the first matching row in the first
> page of the table. This is still cheaper than doing an index scan
> (which must require reading at least one index page plus at least one
> table page).
>
> The test case you are showing is probably suffering from nonrandom
> placement of this particular data value; which is something that the
> statistics we keep are too crude to detect.
>
> regards, tom lane

You are correct about non-random placement. I'm a little surprised it
doesn't change with values, then. For instance,

# select count(*) from finst_t where store_id = 52;
13967

Still does a sequential scan for the "select id from..." query.

The only value it does an index query for is 9605 which only has 1 row.

It estimates ~18,000 rows, but that is still < 3% of the total data.

This row corresponds to disk location where files can be found. So when
a storage location fills up, generally a new one is created. This means
that *generally* the numbers will be increasing as you go further in the
table (not guaranteed, as there are multiple locations open at any one
time).

Am I better off in this case just wrapping my query with:

set enable_seqscan to off;
query
set enable_seqscan to on;

There is still the possibility that there is a better way to determine
existence of a value in a column. I was wondering about something like:

SELECT 1 WHERE EXISTS
(SELECT id FROM finst_t WHERE store_id=52 LIMIT 1);

Though the second part is the same, so it still does the sequential scan.

This isn't critical, I was just trying to understand what's going on.
Thanks for your help.

John
=:->

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rod Dutton 2004-10-24 22:00:42 Reindexdb and REINDEX
Previous Message Tom Lane 2004-10-24 20:11:53 Re: Sequential Scan with LIMIT