Skip site navigation (1) Skip section navigation (2)

Re: Sequential Scan with LIMIT

From: John Meinel <john(at)johnmeinel(dot)com>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential Scan with LIMIT
Date: 2004-10-25 14:33:38
Message-ID: 417D0EC2.5020206@johnmeinel.com (view raw or flat)
Thread:
Lists: pgsql-performance
Curt Sampson wrote:
> On Sun, 24 Oct 2004, John Meinel wrote:
> 
> 
>>I was looking into another problem, and I found something that surprised
>>me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
>>Now "col" is indexed...
>>The real purpose of this query is to check to see if a value exists in
>>the column,...
> 
> 
> When you select all the columns, you're going to force it to go to the
> table. If you select only the indexed column, it ought to be able to use
> just the index, and never read the table at all. You could also use more
> standard and more set-oriented SQL while you're at it:
> 
>     SELECT DISTINCT(col) FROM mytable WHERE col = 'myval'
> 
> cjs

Well, what you wrote was actually much slower, as it had to scan the 
whole table, grab all the rows, and then distinct them in the end.

However, this query worked:


	SELECT DISTINCT(col) FROM mytable WHERE col = 'myval' LIMIT 1;


Now, *why* that works differently from:

SELECT col FROM mytable WHERE col = 'myval' LIMIT 1;
or
SELECT DISTINCT(col) FROM mytable WHERE col = 'myval';

I'm not sure. They all return the same information.

What's also weird is stuff like:
SELECT DISTINCT(NULL) FROM mytable WHERE col = 'myval' LIMIT 1;

Also searches the entire table, sorting that NULL == NULL wherever col = 
'myval'. Which is as expensive as the non-limited case (I'm guessing 
that the limit is occurring after the distinct, which is causing the 
problem. SELECT NULL FROM ... still uses a sequential scan, but it stops 
after finding the first one.)


Actually, in doing a little bit more testing, the working query only 
works on some of the values. Probably it just increases the expense 
enough that it switches over. It also has the downside that when it does 
switch to seq scan, it is much more expensive as it has to do a sort and 
a unique on all the entries.

John
=:->

In response to

Responses

pgsql-performance by date

Next:From: Jan WieckDate: 2004-10-25 15:34:25
Subject: Re: ARC Memory Usage analysis
Previous:From: Andrew SullivanDate: 2004-10-25 13:51:31
Subject: Re: Performance Anomalies in 7.4.5

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group