Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Mark <mwchambers(at)gmail(dot)com>
To: Ron <ronljohnsonjr(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2019-01-02 16:04:03
Message-ID: CAFh58O9T+EGbqveWt3O28cP+tkv7k4+Nx7fE_wyje_KLeBUZug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Ron,

I tried my test_table example below using swapping 'BEGIN' for:

=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

It made no difference to the behaviour.

On Wed, 2 Jan 2019 at 15:45 Ron <ronljohnsonjr(at)gmail(dot)com> wrote:

> Try using SERIALIZABLE transactions instead of the default READ COMMITTED.
>
>
> On 1/2/19 9:28 AM, Mark wrote:
>
> Hi Ron,
>
> Yes, my process will commit the transaction (or roll it back) eventually.
> It's the window where
> one transaction has deleted all the rows (before committing) and an
> ANALYSE has ran.
>
> The deleted rows won't make it into the sample even though the transaction
> has NOT been commited.
>
> During this time I will get bad row estimates on rows for that ID. You can
> see this in the example below with two
> database connections (labelled 1 and 2).
>
> I would have expected the DELETE to have no effect until it was committed.
>
> connection 1=> create table test_table(school_id integer, note text);
> CREATE TABLE
> connection 1=> insert into test_table(school_id, note) SELECT g.id,md5(random()::text)
> FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> insert into test_table(school_id, note) SELECT g.id,md5(random()::text)
> FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> insert into test_table(school_id, note) SELECT g.id,md5(random()::text)
> FROM generate_series(1,2) as g(id);
> INSERT 0 2
> connection 1=> select * from test_table ;
> school_id | note
> -----------+----------------------------------
> 1 | 0e08cf3990a04f0e943584517c564d31
> 2 | 96bf83ae5f8eb9342e8408b1ac25cb14
> 1 | f8fd943012edfe42a03a421df660bc33
> 2 | cd5d7ff0abca61f18857df9b21d234e0
> 1 | 60d731f430cb68c7285ddbcd9186eaa0
> 2 | 635e6c9cf305147ad8684213f0a9299c
> (6 rows)
>
> connection 1=> analyse verbose test_table ;
> INFO: analyzing "public.test_table"
> INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
> dead rows; 6 rows in sample, 6 estimated total rows
> ANALYZE
> connection 1=> explain analyse select * from test_table where school_id =
> 2 ;
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------
> Seq Scan on test_table (cost=0.00..1.07 rows=3 width=37) (actual
> time=0.011..0.015 rows=3 loops=1)
> Filter: (school_id = 2)
> Rows Removed by Filter: 3
> Planning time: 0.164 ms
> Execution time: 0.043 ms
> (5 rows)
>
> connection 2=> BEGIN ;
> BEGIN
> connection 2=> delete from test_table where school_id = 2 ;
> DELETE 3
> connection 2=> /* This connection is now idle */
>
>
> connection 1=> analyse verbose test_table ;
> INFO: analyzing "public.test_table"
> INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
> dead rows; 3 rows in sample, 6 estimated total rows
> ANALYZE
> connection 1=> explain analyse select * from test_table where school_id =
> 2 ;
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------
> Seq Scan on test_table (cost=0.00..1.07 rows=1 width=37) (actual
> time=0.009..0.014 rows=3 loops=1)
> Filter: (school_id = 2)
> Rows Removed by Filter: 3
> Planning time: 0.095 ms
> Execution time: 0.039 ms
> (5 rows)
>
>
> On Wed, 2 Jan 2019 at 14:04 Ron <ronljohnsonjr(at)gmail(dot)com> wrote:
>
>> But Jeff said "left open indefinitely without either committing or
>> rolling back". Your process is committing the transaction.
>>
>>
>> On 1/2/19 6:15 AM, Mark wrote:
>>
>> Hi Jeff,
>>
>> Thanks for your help. That is exactly what is happening.
>>
>> I have a long running job which deletes all of the common_student table
>> and then repopulates it. It takes long time to load all the other data and
>> commit the transaction. I didn't think the delete inside the transaction
>> would have any effect until it is commited or rolled back.
>>
>> I will have to rewrite the application so it updates the existing rows
>> rather than deleting all and then inserting.
>>
>> Thanks again for helping me understand what's happening here.
>>
>> Proof:
>>
>> db=> explain analyze select * from common_student where school_id = 36;
>> QUERY PLAN
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------
>> Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
>> width=384) (actual time=4.852..7.065 rows=1388 loops=1)
>> Recheck Cond: (school_id = 36)
>> Heap Blocks: exact=67
>> -> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
>> rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
>> Index Cond: (school_id = 36)
>> Planning time: 0.097 ms
>> Execution time: 8.084 ms
>> (7 rows)
>>
>> db=> /* At this point I have started a long running transaction that
>> deletes all of common_student for school_id 36 */ ;
>>
>> db=> analyse verbose common_student(school_id);
>> INFO: analyzing "public.common_student"
>> INFO: "common_student": scanned 7322 of 7322 pages, containing 65431
>> live rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total
>> rows
>> ANALYZE
>> db=> explain analyze select * from common_student where school_id = 36;
>> QUERY
>> PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------
>> Index Scan using idx_common_student_sid on common_student
>> (cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
>> loops=1)
>> Index Cond: (school_id = 36)
>> Planning time: 0.098 ms
>> Execution time: 2.583 ms
>> (4 rows)
>>
>> db=> /* At this point I have killed the long running transaction that
>> deletes all of common_student for school_id 36 */ ;
>> db=> vacuum analyze common_student;
>> VACUUM
>> db=> explain analyze select * from common_student where school_id = 36;
>> QUERY PLAN
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------
>> Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
>> width=383) (actual time=0.088..1.302 rows=1388 loops=1)
>> Recheck Cond: (school_id = 36)
>> Heap Blocks: exact=67
>> -> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
>> rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
>> Index Cond: (school_id = 36)
>> Planning time: 0.327 ms
>> Execution time: 2.311 ms
>> (7 rows)
>>
>>
>> On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>>>
>>>> - Does the analyse output below mean that it only scanned 51538 of
>>>> 65463 rows in the table? Is school_id 36 just being missed in the sample?
>>>> (This happens when the analyse is repeated )
>>>>
>>>
>>> Is there a transaction which had deleted all of school_id=36, and then
>>> was just left open indefinitely without either committing or rolling back?
>>>
>>> That would explain it, and I don't know of anything else that could.
>>> The deleted but not committed tuples are still live, but don't get sampled.
>>>
>>> Cheers,
>>>
>>> Jeff
>>>
>>
>> --
>> Angular momentum makes the world go 'round.
>>
>
> --
> Angular momentum makes the world go 'round.
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-01-02 16:11:39 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Ron 2019-01-02 15:44:57 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-02 16:11:39 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Daniel Verite 2019-01-02 15:58:35 backslash-dot quoting in COPY CSV