Re: Bug? 8.0 does not use partial index

From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Hansen <john(at)geeknet(dot)com(dot)au>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Bug? 8.0 does not use partial index
Date: 2005-01-14 00:41:46
Message-ID: C475FCAB18F868ED1CE84261@palle.girgensohn.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

--On torsdag, januari 13, 2005 18.55.11 -0500 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
wrote:

> Palle Girgensohn <girgen(at)pingpong(dot)net> writes:
>> --On torsdag, januari 13, 2005 18.18.37 -0500 Tom Lane
>> <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> So there's something nuts about the statistics in this case.
>
> On looking into it, it's the same old issue of not having column
> correlation statistics. pg_stats shows that ANALYZE estimated the
> fraction of rows with null group_id as 0.137667 (versus exact value
> of 0.147, not too bad considering I used the default statistics target)
> and it estimated the fraction with this_group_id = 46 as 0.358
> (vs actual 0.369, ditto). The problem is that it then estimates the
> total selectivity as 0.137667 * 0.358 or a bit under 5%, much too high
> to make an indexscan sensible. In reality there are only 4 rows with
> this combination of values, but the planner has no way to know that.
>
>> Anything I can do about it?
>
> I thought of a fairly miserable hack, which relies on the fact that 8.0
> does know how to accumulate statistics on functional indexes:
>
> group=# create index fooi on group_data (abs(this_group_id)) WHERE
> group_id IS NULL; CREATE INDEX
> group=# analyze group_data;
> ANALYZE
> group=# explain select * from group_data where group_id is null and
> abs(this_group_id) = 46; QUERY PLAN
> -------------------------------------------------------------------------
> ----- Index Scan using fooi on group_data (cost=0.00..5302.60 rows=1802
> width=42) Index Cond: (abs(this_group_id) = 46)
> Filter: (group_id IS NULL)
> (3 rows)
>
> (The choice of abs() is arbitrary, it just has to be something other
> than the unadorned column.) In this situation the planner will look at
> the stats for the functional index and discover that in that index there
> aren't many 46's, so it comes out with a more reasonable rowcount
> estimate.

OK, I think I understand. And this is changed between 7.4.x and 8.0?

> We should probably make it accumulate stats on partial indexes even when
> the index columns aren't expressions. This example shows that useful
> stats can be derived that way. Too late for 8.0 though...

True, but for next version, perhaps? :)

Trying all this out, I realize that on 7.4.5, I can sometimes get different
results after `vacuum analyze' vs. a plain `analyze' (again, not exactly
the same data, and I cannot reproduce this on the other machine with the
data I sent you). It does not really relate to the question above, but
perhaps you can explain how come I get different results?

I join with a table person, group_data.item_text has person.userid as
foreign key constraint:

7.4.5:

pp=# vacuum analyze group_data;
VACUUM
Time: 256353,802 ms
pp=# select
pp-# distinct p.last_name,
pp-# p.userid
pp-# from
pp-# group_data gd join person p on (p.userid = gd.item_text)
pp-# where
pp-# gd.this_group_id = 46
pp-# and gd.group_id is null;
last_name | userid
-----------+----------
Lastname | u1wmd5nn
(1 row)

Time: 6223,123 ms
pp=# explain analyze
pp-# select
pp-# distinct p.last_name,
pp-# p.userid
pp-# from
pp-# group_data gd join person p on (p.userid = gd.item_text)
pp-# where
pp-# gd.this_group_id = 46
pp-# and gd.group_id is null;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=76016.57..77215.19 rows=92632 width=23) (actual
time=7649.496..7649.512 rows=1 loops=1)
-> Sort (cost=76016.57..76416.11 rows=159816 width=23) (actual
time=7649.481..7649.487 rows=1 loops=1)
Sort Key: p.last_name, p.userid
-> Hash Join (cost=3003.90..62203.64 rows=159816 width=23)
(actual time=7649.254..7649.435 rows=1 loops=1)
Hash Cond: ("outer".item_text = "inner".userid)
-> Seq Scan on group_data gd (cost=0.00..53238.10
rows=160565 width=12) (actual time=431.078..5927.410 rows=5 loops=1)
Filter: ((this_group_id = 46) AND (group_id IS NULL))
-> Hash (cost=2229.32..2229.32 rows=92632 width=23)
(actual time=1555.797..1555.797 rows=0 loops=1)
-> Seq Scan on person p (cost=0.00..2229.32
rows=92632 width=23) (actual time=0.093..856.728 rows=92632 loops=1)
Total runtime: 7652.771 ms
(10 rows)

Time: 7656,909 ms
pp=# select * from group_data where this_group_id=46 and group_id is null;
this_group_id | group_id | item_text | item_int | link_path
---------------+----------+-----------+----------+-----------
46 | | | 1223 | :46:
46 | | | 1228 | :46:
46 | | | 1328 | :46:
46 | | | 1391 | :46:
46 | | u1wmd5nn | | :46:
(5 rows)

Time: 5891,716 ms
pp=# analyze group_data;
ANALYZE
Time: 3210,096 ms
pp=# explain select
pp-# distinct p.last_name,
pp-# p.userid
pp-# from
pp-# group_data gd join person p on (p.userid = gd.item_text)
pp-# where
pp-# gd.this_group_id = 46
pp-# and gd.group_id is null;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Unique (cost=111417.47..113761.30 rows=92632 width=23)
-> Sort (cost=111417.47..112198.75 rows=312510 width=23)
Sort Key: p.last_name, p.userid
-> Hash Join (cost=3003.90..79231.40 rows=312510 width=23)
Hash Cond: ("outer".item_text = "inner".userid)
-> Index Scan using group_data_tgid_gidnull_idx on
group_data gd (cost=0.00..65091.35 rows=275225 width=11)
Index Cond: (this_group_id = 46)
Filter: (group_id IS NULL)
-> Hash (cost=2229.32..2229.32 rows=92632 width=23)
-> Seq Scan on person p (cost=0.00..2229.32
rows=92632 width=23)
(10 rows)

Time: 6,647 ms
pp=# select
pp-# distinct p.last_name,
pp-# p.userid
pp-# from
pp-# group_data gd join person p on (p.userid = gd.item_text)
pp-# where
pp-# gd.this_group_id = 46
pp-# and gd.group_id is null;
last_name | userid
-----------+----------
Lastname | u1wmd5nn
(1 row)

Time: 772,969 ms
pp=# select
pp-# distinct p.last_name,
pp-# p.userid
pp-# from
pp-# person p, group_data gd
pp-# where
pp-# p.userid = gd.item_text
pp-# and gd.this_group_id = 46
pp-# and gd.group_id is null
pp-# ;
last_name | userid
-----------+----------
Lastname | u1wmd5nn
(1 row)

Time: 720,345 ms

/Palle

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2005-01-14 00:44:44 Re: [HACKERS] Much Ado About COUNT(*)
Previous Message Tom Lane 2005-01-14 00:32:38 Re: Bug? 8.0 does not use partial index