Re: Sequential Scan with LIMIT

From: John Meinel <john(at)johnmeinel(dot)com>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential Scan with LIMIT
Date: 2004-10-25 14:33:38
Message-ID: 417D0EC2.5020206@johnmeinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Curt Sampson wrote:
> On Sun, 24 Oct 2004, John Meinel wrote:
>
>
>>I was looking into another problem, and I found something that surprised
>>me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
>>Now "col" is indexed...
>>The real purpose of this query is to check to see if a value exists in
>>the column,...
>
>
> When you select all the columns, you're going to force it to go to the
> table. If you select only the indexed column, it ought to be able to use
> just the index, and never read the table at all. You could also use more
> standard and more set-oriented SQL while you're at it:
>
> SELECT DISTINCT(col) FROM mytable WHERE col = 'myval'
>
> cjs

Well, what you wrote was actually much slower, as it had to scan the
whole table, grab all the rows, and then distinct them in the end.

However, this query worked:

SELECT DISTINCT(col) FROM mytable WHERE col = 'myval' LIMIT 1;

Now, *why* that works differently from:

SELECT col FROM mytable WHERE col = 'myval' LIMIT 1;
or
SELECT DISTINCT(col) FROM mytable WHERE col = 'myval';

I'm not sure. They all return the same information.

What's also weird is stuff like:
SELECT DISTINCT(NULL) FROM mytable WHERE col = 'myval' LIMIT 1;

Also searches the entire table, sorting that NULL == NULL wherever col =
'myval'. Which is as expensive as the non-limited case (I'm guessing
that the limit is occurring after the distinct, which is causing the
problem. SELECT NULL FROM ... still uses a sequential scan, but it stops
after finding the first one.)

Actually, in doing a little bit more testing, the working query only
works on some of the values. Probably it just increases the expense
enough that it switches over. It also has the downside that when it does
switch to seq scan, it is much more expensive as it has to do a sort and
a unique on all the entries.

John
=:->

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jan Wieck 2004-10-25 15:34:25 Re: ARC Memory Usage analysis
Previous Message Andrew Sullivan 2004-10-25 13:51:31 Re: Performance Anomalies in 7.4.5