Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Mark <mwchambers(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2018-12-21 16:39:07
Message-ID: CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi All,

I'm seeing some very slow queries and it looks like the query planner is
deciding to do a 'Nested Loop Left Join'
which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash
Left Join' which is much faster.

I think I can see the cause of the problem in the examples below. I can't
make sense of the statistics being written
by ANALYSE or the planners row estimates.

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1 ?
( I expect it to be higher)
- Why does '1' appear in the most_common_vals when it is actually the least
common value.
- Why doesn't 36 appear in the most_common_vals (it is more common than 1)
- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Any help with understanding what's happening here would be much
appreciated.

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.087 ms
Execution time: 2.706 ms
(4 rows)

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> select count(distinct(school_id)) from common_student ;
count
-------
55
(1 row)

db=> alter table common_student alter column school_id set statistics
100000;
WARNING: lowering statistics target to 10000
ALTER TABLE
db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live
rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
ANALYZE
db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.334 ms
Execution time: 6.542 ms
(4 rows)

db=> select school_id, count(*) from common_student group by school_id
order by count(*) limit 6 ;
school_id | count
-----------+-------
1 | 50
88 | 161
53 | 252
94 | 422
31 | 434
68 | 454
(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id =
36 group by school_id ;
school_id | count
-----------+-------
36 | 1388
(1 row)

db=> explain analyse select * from common_student where school_id = 1 ;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64 width=385)
(actual time=0.128..0.267 rows=50 loops=1)
Recheck Cond: (school_id = 1)
Heap Blocks: exact=16
-> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
Index Cond: (school_id = 1)
Planning time: 0.177 ms
Execution time: 0.414 ms
(7 rows)

db=> select VERSION();
version

-----------------------------------------------------------------------------
PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2, 64-bit
(1 row)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kumar, Virendra 2018-12-21 18:39:28 RE: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Kumar, Virendra 2018-12-21 15:11:40 RE: Multiple LDAP Servers for ldap Authentication

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-12-21 16:43:06 Re: Remove Deprecated Exclusive Backup Mode
Previous Message Robert Haas 2018-12-21 16:35:24 Re: ATTACH/DETACH PARTITION CONCURRENTLY