Re: Query Slow After 2018

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Nur Agus <nuragus(dot)linux(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Slow After 2018
Date: 2018-01-28 17:51:10
Message-ID: 20180128175110.GA18115@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jan 29, 2018 at 12:32:59AM +0700, Nur Agus wrote:
> The following query run in just 9 ms:

> "distrib_reports"."month" = 1 AND
> "distrib_reports"."year" = 2017 AND
> "distrib_reports"."state" = 'SUBMITTED' AND
> "distrib_report_groups"."distrib_report_group_type_id" =
> '559a5fdc-418d-4494-aebf-80ecf8743d35'

> The explain analyze of the 2 queries are resulting on really different
> query plan, here are the links to depesz:
> 2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
> 2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

> The question is, why the query planner choose such very different path just
> by changing one parameter?

Looks like this badly underestimates its rowcount:

Index Scan using index_distrib_reports_on_year on distrib_reports (cost=0.42..40.62 rows=8 width=32) (actual time=0.034..50.452 rows=17,055 loops=1)
Index Cond: (year = 2018)
Filter: ((month = 1) AND ((state)::text = 'SUBMITTED'::text))
Rows Removed by Filter: 1049

Maybe because "if year==2018" then, month=1 does essentialy nothing ..
..but postgres thinks it'll filters out some 90% of the rows.

And possibly the same for SUBMITTED (?)
You should probably use timestamp column rather than integer year+month.

On PG10, you could probably work around it using "CREATE STATISTICS".

> This behaviour is *not-reproducable* on postgres-10. On postgres-10, the
> query plan are consistent, and both have very acceptable time:
> 2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
> 2018 --> --> explain result on postgres-10:
> https://explain.depesz.com/s/Tf5K
..I think default max_parallel_workers_per_gather=3 by chance causes the plan
to be the same.

I think there's still a underestimate rowcount with PG10 (without CREATE
STATISTICS), but it's masked by "rows=1 actual rows=0" roundoff error with
high loop count.

Justin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David G. Johnston 2018-01-28 21:40:44 Re: Query Slow After 2018
Previous Message Nur Agus 2018-01-28 17:32:59 Query Slow After 2018