Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2019-01-02 15:44:57
Message-ID: c0ac4d7a-62d2-eea1-5818-4f9a45cbe453@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Try using SERIALIZABLE transactions instead of the default READ COMMITTED.

On 1/2/19 9:28 AM, Mark wrote:
> Hi Ron,
>
> Yes, my process will commit the transaction (or roll it back) eventually.
> It's the window where
> one transaction has deleted all the rows (before committing) and an
> ANALYSE has ran.
>
> The deleted rows won't make it into the sample even though the transaction
> has NOT been commited.
>
> During this time I will get bad row estimates on rows for that ID. You can
> see this in the example below with two
> database connections (labelled 1 and 2).
>
> I would have expected the DELETE to have no effect until it was committed.
>
> connection 1=> create table test_table(school_id integer, note text);
> CREATE TABLE
> connection 1=> insert into test_table(school_id, note) SELECT g.id
> <http://g.id>,md5(random()::text) FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> insert into test_table(school_id, note) SELECT g.id
> <http://g.id>,md5(random()::text) FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> insert into test_table(school_id, note) SELECT g.id
> <http://g.id>,md5(random()::text) FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> select * from test_table ;
>  school_id |               note
> -----------+----------------------------------
>          1 | 0e08cf3990a04f0e943584517c564d31
>          2 | 96bf83ae5f8eb9342e8408b1ac25cb14
>          1 | f8fd943012edfe42a03a421df660bc33
>          2 | cd5d7ff0abca61f18857df9b21d234e0
>          1 | 60d731f430cb68c7285ddbcd9186eaa0
>          2 | 635e6c9cf305147ad8684213f0a9299c
> (6 rows)
>
> connection 1=> analyse verbose test_table ;
> INFO:  analyzing "public.test_table"
> INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
> dead rows; 6 rows in sample, 6 estimated total rows
> ANALYZE
> connection 1=> explain analyse select * from test_table where school_id = 2 ;
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Seq Scan on test_table  (cost=0.00..1.07 rows=3 width=37) (actual
> time=0.011..0.015 rows=3 loops=1)
>    Filter: (school_id = 2)
>    Rows Removed by Filter: 3
>  Planning time: 0.164 ms
>  Execution time: 0.043 ms
> (5 rows)
>
> connection 2=> BEGIN ;
> BEGIN
> connection 2=> delete from test_table where school_id = 2 ;
> DELETE 3
> connection 2=> /* This connection is now idle */
>
>
> connection 1=> analyse verbose test_table ;
> INFO:  analyzing "public.test_table"
> INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
> dead rows; 3 rows in sample, 6 estimated total rows
> ANALYZE
> connection 1=> explain analyse select * from test_table where school_id = 2 ;
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Seq Scan on test_table  (cost=0.00..1.07 rows=1 width=37) (actual
> time=0.009..0.014 rows=3 loops=1)
>    Filter: (school_id = 2)
>    Rows Removed by Filter: 3
>  Planning time: 0.095 ms
>  Execution time: 0.039 ms
> (5 rows)
>
>
> On Wed, 2 Jan 2019 at 14:04 Ron <ronljohnsonjr(at)gmail(dot)com
> <mailto:ronljohnsonjr(at)gmail(dot)com>> wrote:
>
> But Jeff said "left open indefinitely without either committing or
> rolling back".  Your process is committing the transaction.
>
>
> On 1/2/19 6:15 AM, Mark wrote:
>> Hi Jeff,
>>
>> Thanks for your help. That is exactly what is happening.
>>
>> I have a long running job which deletes all of the common_student
>> table and then repopulates it. It takes long time to load all the
>> other data and commit the transaction. I didn't think the delete
>> inside the transaction would have any effect until it is commited or
>> rolled back.
>>
>> I will have to rewrite the application so it updates the existing
>> rows rather than deleting all and then inserting.
>>
>> Thanks again for helping me understand what's happening here.
>>
>> Proof:
>>
>> db=> explain analyze select * from common_student where school_id = 36;
>>               QUERY PLAN
>> --------------------------------------------------------------------------------------------------------------------------------------
>>  Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
>> width=384) (actual time=4.852..7.065 rows=1388 loops=1)
>>    Recheck Cond: (school_id = 36)
>>    Heap Blocks: exact=67
>>    ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
>> rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
>>          Index Cond: (school_id = 36)
>>  Planning time: 0.097 ms
>>  Execution time: 8.084 ms
>> (7 rows)
>>
>> db=> /* At this point I have started a long running transaction that
>> deletes all of common_student for school_id 36  */ ;
>>
>> db=> analyse verbose common_student(school_id);
>> INFO:  analyzing "public.common_student"
>> INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431
>> live rows and 8060 dead rows; 56818 rows in sample, 65431 estimated
>> total rows
>> ANALYZE
>> db=> explain analyze select * from common_student where school_id = 36;
>>                  QUERY PLAN
>> --------------------------------------------------------------------------------------------------------------------------------------------
>>  Index Scan using idx_common_student_sid on common_student 
>> (cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498
>> rows=1388 loops=1)
>>    Index Cond: (school_id = 36)
>>  Planning time: 0.098 ms
>>  Execution time: 2.583 ms
>> (4 rows)
>>
>> db=> /* At this point I have killed the long running transaction that
>> deletes all of common_student for school_id 36  */ ;
>> db=> vacuum analyze common_student;
>> VACUUM
>> db=> explain analyze select * from common_student where school_id = 36;
>>               QUERY PLAN
>> --------------------------------------------------------------------------------------------------------------------------------------
>>  Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
>> width=383) (actual time=0.088..1.302 rows=1388 loops=1)
>>    Recheck Cond: (school_id = 36)
>>    Heap Blocks: exact=67
>>    ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..78.83
>> rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
>>          Index Cond: (school_id = 36)
>>  Planning time: 0.327 ms
>>  Execution time: 2.311 ms
>> (7 rows)
>>
>>
>> On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff(dot)janes(at)gmail(dot)com
>> <mailto:jeff(dot)janes(at)gmail(dot)com>> wrote:
>>
>>
>> - Does the analyse output below mean that it only scanned
>> 51538 of 65463 rows in the table? Is school_id 36 just being
>> missed in the sample? (This happens when the analyse is
>> repeated )
>>
>>
>> Is there a transaction which had deleted all of school_id=36, and
>> then was just left open indefinitely without either committing or
>> rolling back?
>>
>> That would explain it, and I don't know of anything else that
>> could.  The deleted but not committed tuples are still live, but
>> don't get sampled.
>>
>> Cheers,
>>
>> Jeff
>>
>
> --
> Angular momentum makes the world go 'round.
>

--
Angular momentum makes the world go 'round.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark 2019-01-02 16:04:03 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Mark 2019-01-02 15:28:02 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2019-01-02 15:58:35 backslash-dot quoting in COPY CSV
Previous Message Heikki Linnakangas 2019-01-02 15:36:33 Re: GiST VACUUM