Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Mark <mwchambers(at)gmail(dot)com>
To: "Kumar, Virendra" <Virendra(dot)Kumar(at)guycarp(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2018-12-22 09:14:05
Message-ID: CAFh58O9b6pLtB0UfxaXszVGr_xWW_5nQFDeOZ+W=yvu-D82=AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Virendra,

Thanks for the reply.

You must have missed it, but I've already done that (see my original
email). The stats target for that column is already set to 10000.

db=> SELECT attstattarget FROM pg_attribute WHERE attrelid =
'public.common_student'::regclass AND attname = 'school_id';
attstattarget
---------------
10000
(1 row)

Mark

On Fri, 21 Dec 2018 at 18:39 Kumar, Virendra <Virendra(dot)Kumar(at)guycarp(dot)com>
wrote:

> Hi Mark,
>
>
>
> Can you try setting up stats target for school_id column and analyze table
> and see where it takes, something like:
>
> --
>
> ALTER table common_student ALTER COLUMN school_id SET STATISTICS 10000;
>
> ANALYZE common_stundent;
>
>
>
> Regards,
>
> Virendra
>
>
>
> *From:* Mark [mailto:mwchambers(at)gmail(dot)com]
> *Sent:* Friday, December 21, 2018 11:39 AM
> *To:* pgsql-general(at)lists(dot)postgresql(dot)org
> *Subject:* Query planner / Analyse statistics bad estimate rows=1 with
> maximum statistics 10000 on PostgreSQL 10.2
>
>
>
> Hi All,
>
>
>
> I'm seeing some very slow queries and it looks like the query planner is
> deciding to do a 'Nested Loop Left Join'
>
> which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash
> Left Join' which is much faster.
>
>
>
> I think I can see the cause of the problem in the examples below. I can't
> make sense of the statistics being written
>
> by ANALYSE or the planners row estimates.
>
>
>
> Can anyone help me understand....
>
>
>
> - Why the row estimate I get in the query below for school_id = 36 is 1 ?
> ( I expect it to be higher)
>
> - Why does '1' appear in the most_common_vals when it is actually the
> least common value.
>
> - Why doesn't 36 appear in the most_common_vals (it is more common than 1)
>
> - Does the analyse output below mean that it only scanned 51538 of 65463
> rows in the table? Is school_id 36 just being missed in the sample? (This
> happens when the analyse is repeated )
>
>
>
> Any help with understanding what's happening here would be much
> appreciated.
>
>
>
> I hope I've provided enough information below.
>
>
>
> Thanks,
>
>
>
> Mark
>
>
>
> db=> explain analyse select * from common_student where school_id = 36 ;
>
> QUERY
> PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Index Scan using idx_common_student_sid on common_student
> (cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
> loops=1)
>
> Index Cond: (school_id = 36)
>
> Planning time: 0.087 ms
>
> Execution time: 2.706 ms
>
> (4 rows)
>
>
>
> db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
> from pg_stats where attname='school_id' and tablename='common_stude
>
> nt';
>
> tablename | attname |
> most_common_vals
> | histogram_bounds | n_distinct
>
>
> ----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
>
> common_student | school_id |
> {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
> | | 45
>
> (1 row)
>
>
>
> db=> select count(distinct(school_id)) from common_student ;
>
> count
>
> -------
>
> 55
>
> (1 row)
>
>
>
> db=> alter table common_student alter column school_id set statistics
> 100000;
>
> WARNING: lowering statistics target to 10000
>
> ALTER TABLE
>
> db=> analyse verbose common_student(school_id);
>
> INFO: analyzing "public.common_student"
>
> INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live
> rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
>
> ANALYZE
>
> db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
> from pg_stats where attname='school_id' and tablename='common_stude
>
> nt';
>
> tablename | attname |
> most_common_vals
> | histogram_bounds | n_distinct
>
>
> ----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
>
> common_student | school_id |
> {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
> | | 45
>
> (1 row)
>
>
>
> db=> explain analyse select * from common_student where school_id = 36 ;
>
> QUERY
> PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Index Scan using idx_common_student_sid on common_student
> (cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
> loops=1)
>
> Index Cond: (school_id = 36)
>
> Planning time: 0.334 ms
>
> Execution time: 6.542 ms
>
> (4 rows)
>
>
>
> db=> select school_id, count(*) from common_student group by school_id
> order by count(*) limit 6 ;
>
> school_id | count
>
> -----------+-------
>
> 1 | 50
>
> 88 | 161
>
> 53 | 252
>
> 94 | 422
>
> 31 | 434
>
> 68 | 454
>
> (6 rows)
>
>
>
> dvpjxbzc=> select school_id, count(*) from common_student where school_id
> = 36 group by school_id ;
>
> school_id | count
>
> -----------+-------
>
> 36 | 1388
>
> (1 row)
>
>
>
> db=> explain analyse select * from common_student where school_id = 1 ;
>
> QUERY PLAN
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64 width=385)
> (actual time=0.128..0.267 rows=50 loops=1)
>
> Recheck Cond: (school_id = 1)
>
> Heap Blocks: exact=16
>
> -> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
> rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
>
> Index Cond: (school_id = 1)
>
> Planning time: 0.177 ms
>
> Execution time: 0.414 ms
>
> (7 rows)
>
>
>
> db=> select VERSION();
>
> version
>
>
>
> -----------------------------------------------------------------------------
>
> PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2,
> 64-bit
>
> (1 row)
>
>
>
> ------------------------------
>
> This message is intended only for the use of the addressee and may contain
> information that is PRIVILEGED AND CONFIDENTIAL.
>
> If you are not the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please erase all copies of the
> message
> and its attachments and notify the sender immediately. Thank you.
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ricardo Martin Gomez 2018-12-22 16:42:10 Re: Watching for view changes
Previous Message Mitar 2018-12-22 07:41:16 Re: Watching for view changes

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= 2018-12-22 13:33:23 Re: [PATCH] Improve tab completion for CREATE TABLE
Previous Message Simon Riggs 2018-12-22 08:18:35 Re: Joins on TID