Skip site navigation (1) Skip section navigation (2)

Re: will the planner ever use an index when the condition is <> ?

From: Roxanne Reid-Bennett <rox(at)tara-lu(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: will the planner ever use an index when the condition is <> ?
Date: 2011-12-18 14:52:14
Message-ID: 4EEDFE1E.9070303@tara-lu.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 12/17/2011 11:24 AM, Filip Rembiałkowski wrote:
> Normally there is no chance it could work,
> because (a) the planner does not know all possible values of a column,
> and (b) btree indexes cannot search on "not equal" operator.

Is there an index type that can check "not equal"?
This specific column has a limited number of possible values - it is 
essentially an enumerated list.

Roxanne
>
>
> BTW I've just made a case where - logically - it could work, but it
> still does not:
>
> create table nums ( num int4 not null, check(num=1 or num=2) );
> insert into nums select case when random()<=0.99 then 1 else 2 end
> from generate_series(1,1000000);
> create index nums_idx on nums(num);
> analyze nums;
> set constraint_exclusion to 'on';
> explain select * from nums where num<>1;
> --planner could estimate selectivity as 1%, and use index with "=2"
> filter basing on check constraint?
>
>
>
>
> 2011/12/17 Roxanne Reid-Bennett<rox(at)tara-lu(dot)com>:
>> I have a query that used<>  against an indexed column. In this
>> case I can use the reverse and use in or = and get the performance
>> I need... but "in general"... will the planner ever use an index when the
>> related column is compared using<>?
>>
>> I feel like the answer is no, but wanted to ask.
>>
>> Roxanne
>> Postgres Version 8.4.9 PostGIS version 1.5.2
>>
>>
>>
>> Context for question:
>>
>> I have the following query:
>>
>> select *
>> from op_region opr, yield_segment_info ysi, data_location dl
>> where opr.op_region_id in
>>          (select distinct op_region_id
>>           from yield_point
>>           where yield>  0
>>           and area>  0
>>           and ST_GeometryType(location)<>  'ST_Point'
>>          )
>> and ysi.op_region_id = opr.op_region_id
>> and dl.data_set_id = opr.data_set_id
>>
>> Yield_Point has 161,575,599 records
>> where yield>0 and area>  0 has 161,263,193 records,
>> where ST_GeometryType(location)<>  'ST_Point' has just 231 records
>>
>> yield_segment_info has 165,929 records
>> op_region has 566,212 records
>> data_location has 394,763
>>
>> All of these have a high volume of insert/delete's.
>> The tables have recently been vacuum full'd and the indexes reindexed.
>> [they are under the management of the autovacuum, but we forced a cleanup on
>> the chance that things had degraded...]
>>
>> If I run an explain analyze:
>>
>> "Nested Loop
>>     (cost=5068203.00..5068230.31 rows=3 width=225308)
>>     (actual time=192571.730..193625.728 rows=236 loops=1)"
>> "->Nested Loop
>>       (cost=5068203.00..5068219.66 rows=1 width=57329)
>>       (actual time=192522.573..192786.698 rows=230 loops=1)"
>> "  ->Nested Loop
>>         (cost=5068203.00..5068211.36 rows=1 width=57268)
>>         (actual time=192509.822..192638.446 rows=230 loops=1)"
>> "    ->HashAggregate
>>            (cost=5068203.00..5068203.01 rows=1 width=4)
>>            (actual time=192471.507..192471.682 rows=230 loops=1)"
>> "       ->Seq Scan on yield_point
>>               (cost=0.00..5068203.00 rows=1 width=4)
>>               (actual time=602.174..192471.177 rows=230 loops=1)"
>> "             Filter: ((yield>  0::double precision) AND
>>                        (area>  0::double precision) AND
>>                        (st_geometrytype(location)<>  'ST_Point'::text))"
>> "    ->Index Scan using op_region_pkey on op_region opr
>>            (cost=0.00..8.33 rows=1 width=57264)
>>            (actual time=0.723..0.723 rows=1 loops=230)"
>> "          Index Cond: (opr.op_region_id = yield_point.op_region_id)"
>> "  ->Index Scan using yield_segment_info_key on yield_segment_info ysi
>>        (cost=0.00..8.29 rows=1 width=65)
>>        (actual time=0.643..0.643 rows=1 loops=230)"
>> "      Index Cond: (ysi.op_region_id = opr.op_region_id)"
>> "->Index Scan using data_location_data_set_idx on data_location dl
>>     (cost=0.00..10.61 rows=3 width=167979)
>>     (actual time=3.611..3.646 rows=1 loops=230)"
>> "Index Cond: (dl.data_set_id = opr.data_set_id)"
>> "Total runtime: 193625.955 ms"
>>
>> yield_point has the following indexes:
>>       btree on ST_GeometryType(location)
>>       gist on location
>>       btree on op_region_id
>>
>> I've also tried an index on
>>       ((yield>  0::double precision) AND (area>  0::double precision) AND
>> (st_geometrytype(location)<>  'ST_Point'::text))
>> ... it still goes for the sequential scan.
>>
>> But if I change it to st_geometrytype(location) = 'ST_Polygon' or
>> even in ('ST_Polygon','ST_MultiPolygon')
>>
>> the planner uses the index.
>>
>> Roxanne
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2011-12-18 18:31:26
Subject: Re: will the planner ever use an index when the condition is <> ?
Previous:From: Віталій ТимчишинDate: 2011-12-18 10:41:21
Subject: Re: will the planner ever use an index when the condition is <> ?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group