Skip site navigation (1) Skip section navigation (2)

Re: Bug? 8.0 does not use partial index

From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Hansen <john(at)geeknet(dot)com(dot)au>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Bug? 8.0 does not use partial index
Date: 2005-01-14 00:41:46
Message-ID: C475FCAB18F868ED1CE84261@palle.girgensohn.se (view raw or flat)
Thread:
Lists: pgsql-hackers
--On torsdag, januari 13, 2005 18.55.11 -0500 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 
wrote:

> Palle Girgensohn <girgen(at)pingpong(dot)net> writes:
>> --On torsdag, januari 13, 2005 18.18.37 -0500 Tom Lane
>> <tgl(at)sss(dot)pgh(dot)pa(dot)us>  wrote:
>>> So there's something nuts about the statistics in this case.
>
> On looking into it, it's the same old issue of not having column
> correlation statistics.  pg_stats shows that ANALYZE estimated the
> fraction of rows with null group_id as 0.137667 (versus exact value
> of 0.147, not too bad considering I used the default statistics target)
> and it estimated the fraction with this_group_id = 46 as 0.358
> (vs actual 0.369, ditto).  The problem is that it then estimates the
> total selectivity as 0.137667 * 0.358 or a bit under 5%, much too high
> to make an indexscan sensible.  In reality there are only 4 rows with
> this combination of values, but the planner has no way to know that.
>
>> Anything I can do about it?
>
> I thought of a fairly miserable hack, which relies on the fact that 8.0
> does know how to accumulate statistics on functional indexes:
>
> group=# create index fooi on group_data (abs(this_group_id)) WHERE
> group_id IS NULL; CREATE INDEX
> group=# analyze group_data;
> ANALYZE
> group=# explain select * from group_data where group_id is null and
> abs(this_group_id) = 46;                                   QUERY PLAN
> -------------------------------------------------------------------------
> -----  Index Scan using fooi on group_data  (cost=0.00..5302.60 rows=1802
> width=42)    Index Cond: (abs(this_group_id) = 46)
>    Filter: (group_id IS NULL)
> (3 rows)
>
> (The choice of abs() is arbitrary, it just has to be something other
> than the unadorned column.)  In this situation the planner will look at
> the stats for the functional index and discover that in that index there
> aren't many 46's, so it comes out with a more reasonable rowcount
> estimate.

OK, I think I understand. And this is changed between 7.4.x and 8.0?

> We should probably make it accumulate stats on partial indexes even when
> the index columns aren't expressions.  This example shows that useful
> stats can be derived that way.  Too late for 8.0 though...

True, but for next version, perhaps? :)

Trying all this out, I realize that on 7.4.5, I can sometimes get different 
results after `vacuum analyze' vs. a plain `analyze' (again, not exactly 
the same data, and I cannot reproduce this on the other machine with the 
data I sent you). It does not really relate to the question above, but 
perhaps you can explain how come I get different results?

I join with a table person, group_data.item_text has person.userid as 
foreign key constraint:

7.4.5:

pp=# vacuum analyze group_data;
VACUUM
Time: 256353,802 ms
pp=# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;
 last_name |  userid
-----------+----------
 Lastname  | u1wmd5nn
(1 row)

Time: 6223,123 ms
pp=# explain analyze
pp-# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;
                                                               QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=76016.57..77215.19 rows=92632 width=23) (actual 
time=7649.496..7649.512 rows=1 loops=1)
   ->  Sort  (cost=76016.57..76416.11 rows=159816 width=23) (actual 
time=7649.481..7649.487 rows=1 loops=1)
         Sort Key: p.last_name, p.userid
         ->  Hash Join  (cost=3003.90..62203.64 rows=159816 width=23) 
(actual time=7649.254..7649.435 rows=1 loops=1)
               Hash Cond: ("outer".item_text = "inner".userid)
               ->  Seq Scan on group_data gd  (cost=0.00..53238.10 
rows=160565 width=12) (actual time=431.078..5927.410 rows=5 loops=1)
                     Filter: ((this_group_id = 46) AND (group_id IS NULL))
               ->  Hash  (cost=2229.32..2229.32 rows=92632 width=23) 
(actual time=1555.797..1555.797 rows=0 loops=1)
                     ->  Seq Scan on person p  (cost=0.00..2229.32 
rows=92632 width=23) (actual time=0.093..856.728 rows=92632 loops=1)
 Total runtime: 7652.771 ms
(10 rows)

Time: 7656,909 ms
pp=# select * from group_data where this_group_id=46 and group_id is null;
 this_group_id | group_id | item_text | item_int | link_path
---------------+----------+-----------+----------+-----------
            46 |          |           |     1223 | :46:
            46 |          |           |     1228 | :46:
            46 |          |           |     1328 | :46:
            46 |          |           |     1391 | :46:
            46 |          | u1wmd5nn  |          | :46:
(5 rows)

Time: 5891,716 ms
pp=# analyze group_data;
ANALYZE
Time: 3210,096 ms
pp=# explain select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;
                                                         QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=111417.47..113761.30 rows=92632 width=23)
   ->  Sort  (cost=111417.47..112198.75 rows=312510 width=23)
         Sort Key: p.last_name, p.userid
         ->  Hash Join  (cost=3003.90..79231.40 rows=312510 width=23)
               Hash Cond: ("outer".item_text = "inner".userid)
               ->  Index Scan using group_data_tgid_gidnull_idx on 
group_data gd  (cost=0.00..65091.35 rows=275225 width=11)
                     Index Cond: (this_group_id = 46)
                     Filter: (group_id IS NULL)
               ->  Hash  (cost=2229.32..2229.32 rows=92632 width=23)
                     ->  Seq Scan on person p  (cost=0.00..2229.32 
rows=92632 width=23)
(10 rows)

Time: 6,647 ms
pp=# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   group_data gd join person p on (p.userid = gd.item_text)
pp-#  where
pp-#   gd.this_group_id = 46
pp-#   and gd.group_id is null;
 last_name |  userid
-----------+----------
 Lastname  | u1wmd5nn
(1 row)

Time: 772,969 ms
pp=# select
pp-#   distinct p.last_name,
pp-#   p.userid
pp-#  from
pp-#   person p, group_data gd
pp-#  where
pp-#   p.userid = gd.item_text
pp-#   and gd.this_group_id = 46
pp-#   and gd.group_id is null
pp-# ;
 last_name |  userid
-----------+----------
 Lastname  | u1wmd5nn
(1 row)

Time: 720,345 ms


/Palle



In response to

Responses

pgsql-hackers by date

Next:From: Greg StarkDate: 2005-01-14 00:44:44
Subject: Re: [HACKERS] Much Ado About COUNT(*)
Previous:From: Tom LaneDate: 2005-01-14 00:32:38
Subject: Re: Bug? 8.0 does not use partial index

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group