Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Mark <mwchambers(at)gmail(dot)com>
To: Ron <ronljohnsonjr(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2019-01-02 15:28:02
Message-ID: CAFh58O8zdxfgn4=zNBOoX2aq14Bnm9ad-0eTzYMitbg5nSugKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Ron,

Yes, my process will commit the transaction (or roll it back) eventually.
It's the window where
one transaction has deleted all the rows (before committing) and an ANALYSE
has ran.

The deleted rows won't make it into the sample even though the transaction
has NOT been commited.

During this time I will get bad row estimates on rows for that ID. You can
see this in the example below with two
database connections (labelled 1 and 2).

I would have expected the DELETE to have no effect until it was committed.

connection 1=> create table test_table(school_id integer, note text);
CREATE TABLE
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> select * from test_table ;
school_id | note
-----------+----------------------------------
1 | 0e08cf3990a04f0e943584517c564d31
2 | 96bf83ae5f8eb9342e8408b1ac25cb14
1 | f8fd943012edfe42a03a421df660bc33
2 | cd5d7ff0abca61f18857df9b21d234e0
1 | 60d731f430cb68c7285ddbcd9186eaa0
2 | 635e6c9cf305147ad8684213f0a9299c
(6 rows)

connection 1=> analyse verbose test_table ;
INFO: analyzing "public.test_table"
INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2
;
QUERY PLAN

-----------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..1.07 rows=3 width=37) (actual
time=0.011..0.015 rows=3 loops=1)
Filter: (school_id = 2)
Rows Removed by Filter: 3
Planning time: 0.164 ms
Execution time: 0.043 ms
(5 rows)

connection 2=> BEGIN ;
BEGIN
connection 2=> delete from test_table where school_id = 2 ;
DELETE 3
connection 2=> /* This connection is now idle */

connection 1=> analyse verbose test_table ;
INFO: analyzing "public.test_table"
INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 3 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2
;
QUERY PLAN

-----------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..1.07 rows=1 width=37) (actual
time=0.009..0.014 rows=3 loops=1)
Filter: (school_id = 2)
Rows Removed by Filter: 3
Planning time: 0.095 ms
Execution time: 0.039 ms
(5 rows)

On Wed, 2 Jan 2019 at 14:04 Ron <ronljohnsonjr(at)gmail(dot)com> wrote:

> But Jeff said "left open indefinitely without either committing or rolling
> back". Your process is committing the transaction.
>
>
> On 1/2/19 6:15 AM, Mark wrote:
>
> Hi Jeff,
>
> Thanks for your help. That is exactly what is happening.
>
> I have a long running job which deletes all of the common_student table
> and then repopulates it. It takes long time to load all the other data and
> commit the transaction. I didn't think the delete inside the transaction
> would have any effect until it is commited or rolled back.
>
> I will have to rewrite the application so it updates the existing rows
> rather than deleting all and then inserting.
>
> Thanks again for helping me understand what's happening here.
>
> Proof:
>
> db=> explain analyze select * from common_student where school_id = 36;
> QUERY PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
> width=384) (actual time=4.852..7.065 rows=1388 loops=1)
> Recheck Cond: (school_id = 36)
> Heap Blocks: exact=67
> -> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
> rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
> Index Cond: (school_id = 36)
> Planning time: 0.097 ms
> Execution time: 8.084 ms
> (7 rows)
>
> db=> /* At this point I have started a long running transaction that
> deletes all of common_student for school_id 36 */ ;
>
> db=> analyse verbose common_student(school_id);
> INFO: analyzing "public.common_student"
> INFO: "common_student": scanned 7322 of 7322 pages, containing 65431 live
> rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
> ANALYZE
> db=> explain analyze select * from common_student where school_id = 36;
> QUERY
> PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan using idx_common_student_sid on common_student
> (cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
> loops=1)
> Index Cond: (school_id = 36)
> Planning time: 0.098 ms
> Execution time: 2.583 ms
> (4 rows)
>
> db=> /* At this point I have killed the long running transaction that
> deletes all of common_student for school_id 36 */ ;
> db=> vacuum analyze common_student;
> VACUUM
> db=> explain analyze select * from common_student where school_id = 36;
> QUERY PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
> width=383) (actual time=0.088..1.302 rows=1388 loops=1)
> Recheck Cond: (school_id = 36)
> Heap Blocks: exact=67
> -> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
> rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
> Index Cond: (school_id = 36)
> Planning time: 0.327 ms
> Execution time: 2.311 ms
> (7 rows)
>
>
> On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>>
>>> - Does the analyse output below mean that it only scanned 51538 of 65463
>>> rows in the table? Is school_id 36 just being missed in the sample? (This
>>> happens when the analyse is repeated )
>>>
>>
>> Is there a transaction which had deleted all of school_id=36, and then
>> was just left open indefinitely without either committing or rolling back?
>>
>> That would explain it, and I don't know of anything else that could. The
>> deleted but not committed tuples are still live, but don't get sampled.
>>
>> Cheers,
>>
>> Jeff
>>
>
> --
> Angular momentum makes the world go 'round.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2019-01-02 15:44:57 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Ron 2019-01-02 14:04:26 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-01-02 15:35:15 Re: GiST VACUUM
Previous Message Tomas Vondra 2019-01-02 15:21:03 Re: explain plans with information about (modified) gucs