Skip site navigation (1) Skip section navigation (2)

Re: Sequential Scan with LIMIT

From: John Meinel <john(at)johnmeinel(dot)com>
To: Jaime Casanova <systemguards(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequential Scan with LIMIT
Date: 2004-10-26 22:21:35
Message-ID: 417ECDEF.8030108@johnmeinel.com (view raw or flat)
Thread:
Lists: pgsql-performance
Jaime Casanova wrote:
[...]
>>
>>I'm not sure. They all return the same information.
> 
> 
> of course, both queries will return the same but
> that's just because you forced it.
> 
> LIMIT and DISTINCT are different things so they behave
> and are plenned different.
> 
> 
> 
>>What's also weird is stuff like:
>>SELECT DISTINCT(NULL) FROM mytable WHERE col =
>>'myval' LIMIT 1;
> 
> 
> why do you want to do such a thing?
> 
> regards,
> Jaime Casanova
> 

I was trying to see if selecting a constant would change things.
I could have done SELECT DISTINCT(1) or just SELECT 1 FROM ...
The idea of the query is that if 'myval' exists in the table, return 
something different than if 'myval' does not exist. If you are writing a 
function, you can use:

SELECT something...
IF FOUND THEN
   do a
ELSE
   do b
END IF;

The whole point of this exercise was just to find what the cheapest 
query is when you want to test for the existence of a value in a column. 
The only thing I've found for my column is:

SET enable_seq_scan TO off;
SELECT col FROM mytable WHERE col = 'myval' LIMIT 1;
SET enable_seq_scan TO on;

My column is not distributed well (larger numbers occur later in the 
dataset, but may occur many times.) In total there are something like 
500,000 rows, the number 555647 occurs 100,000 times, but not until row 
300,000 or so.

The analyzer looks at the data and says "1/5th of the time it is 555647, 
so I can just do a sequential scan as the odds are I don't have to look 
for very long, then I don't have to load the index". It turns out this 
is very bad, where with an index you just have to do 2 page loads, 
instead of reading 300,000 rows.

Obviously this isn't a general-case solution. But if you have a 
situation similar to mine, it might be useful.

(That's one thing with DB tuning. It seems to be very situation 
dependent, and it's hard to plan without a real dataset.)

John
=:->

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2004-10-26 22:37:32
Subject: Re: can't handle large number of INSERT/UPDATEs
Previous:From: Anjan DaveDate: 2004-10-26 22:14:10
Subject: Re: can't handle large number of INSERT/UPDATEs

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group