Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Mark <mwchambers(at)gmail(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2019-01-02 12:27:18
Message-ID: CAFh58O8CSuK52JgpDPHWuFZxqCkQSEmH4c2-66yXc+FALJmaOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Maxim,

Thanks for your help.

Jeff has pointed me in the right direction here, it seems that the rows=1
is due to a long running transaction which deletes all of common_student
for school_id 36 and then repopulates it.

I was unaware that the delete inside the transaction would affect the
VACUUM ANALYSE. As I said to Jeff I will have to rewrite the application to
update the rows if this is the normal behaviour.

Here are the queries:

db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.097 ms
Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36 */ ;

db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.098 ms
Execution time: 2.583 ms
(4 rows)

db=> create table test_table AS SELECT * from common_student;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..3848.89 rows=1381 width=384) (actual
time=10.105..22.680 rows=1388 loops=1)
Filter: (school_id = 36)
Rows Removed by Filter: 64043
Planning time: 0.390 ms
Execution time: 23.767 ms
(5 rows)

db=> drop table test_table;
DROP TABLE
db=> create table test_table AS SELECT * from common_student ORDER BY
school_id;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..3850.89 rows=1341 width=382) (actual
time=5.674..27.585 rows=1388 loops=1)
Filter: (school_id = 36)
Rows Removed by Filter: 64043
Planning time: 0.264 ms
Execution time: 28.643 ms
(5 rows)

db=> explain analyze select * from test_table where school_id = 36;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..3850.89 rows=1341 width=382) (actual
time=20.848..43.272 rows=1388 loops=1)
Filter: (school_id = 36)
Rows Removed by Filter: 64043
Planning time: 0.068 ms
Execution time: 44.423 ms
(5 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36 */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.327 ms
Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 15:28 Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

> Hi Mark,
>
> It's look very weird.
> Can you try something like this (check that you have enough disk space for
> second copy of common_student before):
>
> create table test_table AS SELECT * from common_student;
> Vacuum analyze test_table;
> explain analyze select * from test_table where school_id = 36;
> drop table test_table;
> create table test_table AS SELECT * from common_student ORDER BY school_id
> ;
> Vacuum analyze test_table;
> explain analyze select * from test_table where school_id = 36;
> drop table test_table;
>
> And provide results of both explain analyze queries.
>
>
>
> On Sat, Dec 22, 2018 at 3:39 AM Mark <mwchambers(at)gmail(dot)com> wrote:
>
>> Hi All,
>>
>> I'm seeing some very slow queries and it looks like the query planner is
>> deciding to do a 'Nested Loop Left Join'
>> which is slow. When I SET enable_nestloop=OFF for testing it does a
>> 'Hash Left Join' which is much faster.
>>
>> I think I can see the cause of the problem in the examples below. I
>> can't make sense of the statistics being written
>> by ANALYSE or the planners row estimates.
>>
>> Can anyone help me understand....
>>
>> - Why the row estimate I get in the query below for school_id = 36 is 1
>> ? ( I expect it to be higher)
>> - Why does '1' appear in the most_common_vals when it is actually the
>> least common value.
>> - Why doesn't 36 appear in the most_common_vals (it is more common than 1)
>> - Does the analyse output below mean that it only scanned 51538 of 65463
>> rows in the table? Is school_id 36 just being missed in the sample? (This
>> happens when the analyse is repeated )
>>
>> Any help with understanding what's happening here would be much
>> appreciated.
>>
>> I hope I've provided enough information below.
>>
>> Thanks,
>>
>> Mark
>>
>> db=> explain analyse select * from common_student where school_id = 36 ;
>> QUERY
>> PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------
>> Index Scan using idx_common_student_sid on common_student
>> (cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
>> loops=1)
>> Index Cond: (school_id = 36)
>> Planning time: 0.087 ms
>> Execution time: 2.706 ms
>> (4 rows)
>>
>> db=> select
>> tablename,attname,most_common_vals,histogram_bounds,n_distinct from
>> pg_stats where attname='school_id' and tablename='common_stude
>> nt';
>> tablename | attname |
>> most_common_vals
>> | histogram_bounds | n_distinct
>>
>> ----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
>> common_student | school_id |
>> {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
>> | | 45
>> (1 row)
>>
>> db=> select count(distinct(school_id)) from common_student ;
>> count
>> -------
>> 55
>> (1 row)
>>
>> db=> alter table common_student alter column school_id set statistics
>> 100000;
>> WARNING: lowering statistics target to 10000
>> ALTER TABLE
>> db=> analyse verbose common_student(school_id);
>> INFO: analyzing "public.common_student"
>> INFO: "common_student": scanned 7322 of 7322 pages, containing 65463
>> live rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total
>> rows
>> ANALYZE
>> db=> select
>> tablename,attname,most_common_vals,histogram_bounds,n_distinct from
>> pg_stats where attname='school_id' and tablename='common_stude
>> nt';
>> tablename | attname |
>> most_common_vals
>> | histogram_bounds | n_distinct
>>
>> ----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
>> common_student | school_id |
>> {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
>> | | 45
>> (1 row)
>>
>> db=> explain analyse select * from common_student where school_id = 36 ;
>> QUERY
>> PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------
>> Index Scan using idx_common_student_sid on common_student
>> (cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
>> loops=1)
>> Index Cond: (school_id = 36)
>> Planning time: 0.334 ms
>> Execution time: 6.542 ms
>> (4 rows)
>>
>> db=> select school_id, count(*) from common_student group by school_id
>> order by count(*) limit 6 ;
>> school_id | count
>> -----------+-------
>> 1 | 50
>> 88 | 161
>> 53 | 252
>> 94 | 422
>> 31 | 434
>> 68 | 454
>> (6 rows)
>>
>> dvpjxbzc=> select school_id, count(*) from common_student where school_id
>> = 36 group by school_id ;
>> school_id | count
>> -----------+-------
>> 36 | 1388
>> (1 row)
>>
>> db=> explain analyse select * from common_student where school_id = 1 ;
>> QUERY PLAN
>>
>>
>> ----------------------------------------------------------------------------------------------------------------------------------
>> Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64
>> width=385) (actual time=0.128..0.267 rows=50 loops=1)
>> Recheck Cond: (school_id = 1)
>> Heap Blocks: exact=16
>> -> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
>> rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
>> Index Cond: (school_id = 1)
>> Planning time: 0.177 ms
>> Execution time: 0.414 ms
>> (7 rows)
>>
>> db=> select VERSION();
>> version
>>
>>
>> -----------------------------------------------------------------------------
>> PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2,
>> 64-bit
>> (1 row)
>>
>>
>
> --
> Maxim Boguk
> Senior Postgresql DBA
> https://dataegret.com/
>
> Phone RU: +7 985 433 0000 <+7%20985%20433-00-00>
> Phone UA: +380 99 143 0000 <+380%2099%20143%200000>
> Phone AU: +61 45 218 5678
>
> LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
> Skype: maxim.boguk
>
> "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
> когда я так делаю ещё раз?"
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2019-01-02 14:04:26 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Mark 2019-01-02 12:15:24 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-01-02 12:50:24 Re: pg_dump multi VALUES INSERT
Previous Message Mark 2019-01-02 12:15:24 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2