Skip site navigation (1) Skip section navigation (2)

An "obvious" index not being used

From: Daniele Varrazzo <piro(at)develer(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: An "obvious" index not being used
Date: 2008-06-19 01:07:11
Message-ID: (view raw or flat)
Lists: pgsql-performance

I am experiencing a query for which an useful index is not being used by 
PostgreSQL. The query is in the form:

     select count(*) from foo
     where foo.account_id in (
         select id from accounts where system = 'abc');

and the size of the tables it works on is:

   - 270 records in "accounts" 22 of which match the condition 'abc';
   - 5.3M records in "foo", 92K of which match the query condition.

There is an index in the field "foo.account_id" but is not used. The resulting 
query plan is:

  Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
time=13412.088..13412.089 rows=1 loops=1)
    ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
time=0.678..13307.074 rows=92790 loops=1)
          Hash Cond: (foo.account_id =
          ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
(actual time=0.014..7163.538 rows=5313514 loops=1)
          ->  Hash  (cost=11.70..11.70 rows=22 width=4) (actual
time=0.199..0.199 rows=22 loops=1)
                ->  Bitmap Heap Scan on accounts  (cost=1.42..11.70 rows=22
width=4) (actual time=0.092..0.160 rows=22 loops=1)
                      Recheck Cond: (("system")::text = 'abc'::text)
                      ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..1.42 rows=22 width=0) (actual time=0.077..0.077 rows=22
                            Index Cond: (("system")::text = 'abc'::text)
  Total runtime: 13412.226 ms

There is a seqscan on the large table. If seqscans are disabled, the plan 
becomes the more acceptable:

  Aggregate  (cost=2471979.99..2471980.00 rows=1 width=0) (actual
time=630.977..630.978 rows=1 loops=1)
    ->  Nested Loop  (cost=1258.12..2470897.61 rows=432953 width=0) (actual
time=0.164..526.174 rows=92790 loops=1)
          ->  HashAggregate  (cost=12.75..12.97 rows=22 width=4) (actual
time=0.131..0.169 rows=22 loops=1)
                ->  Bitmap Heap Scan on accounts  (cost=2.42..12.70 rows=22
width=4) (actual time=0.047..0.091 rows=22 loops=1)
                      Recheck Cond: (("system")::text = 'abc'::text)
                      ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..2.42 rows=22 width=0) (actual time=0.036..0.036 rows=22
                            Index Cond: (("system")::text = 'abc'::text)
          ->  Bitmap Heap Scan on foo  (cost=1245.37..111275.14 rows=83024
width=4) (actual time=3.086..14.391 rows=4218 loops=22)
                Recheck Cond: (foo.account_id =
                ->  Bitmap Index Scan on ifoo_x1  (cost=0.00..1224.61
rows=83024 width=0) (actual time=2.962..2.962 rows=4218 loops=22)
                      Index Cond: (foo.account_id =
  Total runtime: 631.121 ms

where the index "ifoo_x1" is used.

A similar query plan can be also obtained performing first the internal query 
and hardcoding the result in a new query:

     explain analyze select count(*) from foo
     where account_id in

I have tried to:

   - rewrite the query with a JOIN instead of an IN (no change in the plan),
   - rewrite the query using EXISTS (it gets worse),
   - raise the statistics for the foo.account_id field to 100 and to 1000,
   - decrease the random_page_cost down to 1,
   - vacuum-analyze the tables at each change,

none of which has changed the situation.

The system is an Ubuntu Hardy 64 bits running PG 8.3. The issue has been 
confirmed on Mac OS 1.5/PG 8.3. Although I made fewer tests on a PG 8.2 we 
recently switched from, I think the issue presents on that version too.

This is the first time I see the query planner failing a plan rather obvious: 
is there any other setting to tweak to force it to do good? (but a sensible 
tweaking: the random_page_cost to 1 was just a try to have the index used, 
nothing to be really put in production)

If you want to try the issue, an anonimized dataset is available on . The file size is 46MB (1.5GB 
uncompressed). Chris Mair, who tested it on Mac OS, also noticed that PG 
behaved correctly with the freshly imported data: as soon as he VACUUMed the 
database he started experiencing the described issue.

Thank you very much.

Daniele Varrazzo - Develer S.r.l.


pgsql-performance by date

Next:From: Tom LaneDate: 2008-06-19 01:43:34
Subject: Re: An "obvious" index not being used
Previous:From: Jaime CasanovaDate: 2008-06-18 23:19:06

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group