Skip site navigation (1) Skip section navigation (2)

Re: tricky query

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: John A Meinel <john(at)arbash-meinel(dot)com>
Cc: Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: tricky query
Date: 2005-06-28 15:42:28
Message-ID: 42C16FE4.2020803@arbash-meinel.com (view raw or flat)
Thread:
Lists: pgsql-performance
John A Meinel wrote:

> Merlin Moncure wrote:
>
>> I need a fast way (sql only preferred) to solve the following problem:
>>
>> I need the smallest integer that is greater than zero that is not in the
>> column of a table.  In other words, if an 'id' column has values
>> 1,2,3,4,6 and 7, I need a query that returns the value of 5.
>>
>> I've already worked out a query using generate_series (not scalable) and
>> pl/pgsql.  An SQL only solution would be preferred, am I missing
>> something obvious?
>>
>> Merlin
>>
>>
>
> Not so bad. Try something like this:
>
> SELECT min(id+1) as id_new FROM table
>    WHERE (id+1) NOT IN (SELECT id FROM table);
>
> Now, this requires probably a sequential scan, but I'm not sure how you
> can get around that.
> Maybe if you got trickier and did some ordering and limits. The above
> seems to give the right answer, though.
>
> I don't know how big you want to scale to.
>
> You might try something like:
> SELECT id+1 as id_new FROM t
>    WHERE (id+1) NOT IN (SELECT id FROM t)
>    ORDER BY id LIMIT 1;
>
> John
> =:->

Well, I was able to improve it to using appropriate index scans.
Here is the query:

SELECT t1.id+1 as id_new FROM id_test t1
    WHERE NOT EXISTS
        (SELECT t2.id FROM id_test t2 WHERE t2.id = t1.id+1)
    ORDER BY t1.id LIMIT 1;

I created a test table which has 90k randomly inserted rows. And this is
what EXPLAIN ANALYZE says:

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..12.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Index Scan using id_test_pkey on id_test t1  (cost=0.00..544423.27 rows=45000 width=4) (actual time=0.000..0.000 rows=1 loops=1)
         Filter: (NOT (subplan))
         SubPlan
           ->  Index Scan using id_test_pkey on id_test t2  (cost=0.00..6.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=15)
                 Index Cond: (id = ($0 + 1))
 Total runtime: 0.000 ms
(7 rows)

The only thing I have is a primary key index on id_test(id);

John
=:->

In response to

Responses

pgsql-performance by date

Next:From: Karl O. PincDate: 2005-06-28 15:42:32
Subject: Fwd: Re: [PERFORM] Performance analysis of plpgsql code [kop@meme.com]
Previous:From: Merlin MoncureDate: 2005-06-28 15:39:53
Subject: Re: tricky query

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group