Re: inputs into query planner costing

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Mike Roest <mike(dot)roest(at)replicon(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: inputs into query planner costing
Date: 2015-03-31 17:56:49
Message-ID: 551ADFE1.50706@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3/31/2015 10:31 AM, Mike Roest wrote:
> Hi there,
> I'm having an issue with query performance between 2 different pgsql
> environments.
>
> Ther first is our current production postgres server with is running
> 9.3.5 on Centos 5 x64. The second system is Amazon's RDS postgres as a
> service. On our local DB server we have a query that executes in a
> reasonable amount of time (600 msec). On RDS the query will run for
> more then 10 minutes on a similarly CPU specced systems. I've been
> working through with Amazon support and I'm looking for more suggestions
> on where to look (both for me and to direct Amazon). The RDS system
> does use a network filesystem while our production server is a local
> RAID10 array, I can see that effecting the actual performance of the
> query but not the query planner costing (unless there's an input to
> query planner costing that I can't find)
>
> The Query plan costs generated by the 2 systems are vastly different,
> while the plans themselves are basically identical other then the
> materialization that RDS is doing (if I disable the materialization then
> they are almost the same other then a seq scan/heap scan on one small
> <2000 row table). All the tables in the query have been analyzed on
> each server without any impact
>
> Current Production
> Explain:
> http://explain.depesz.com/s/Tkyc
> Explain Analyze
> http://explain.depesz.com/s/UnQt
>
> RDS: (with enable_material=off)
> http://explain.depesz.com/s/vDiV
>
> (with enable_material=on)
> http://explain.depesz.com/s/HUjx
>
> I have validated that all the query planning configuration variables on
> this page
> http://www.postgresql.org/docs/9.3/static/runtime-config-query.html are
> the same between the 2 environments. If I modify the local production
> system values for things like random_page_cost and seq_page_cost to
> absurd values like 60000 I can get it to generate a similar planner
> cost. Similarly if I lower the RDS values to absurdly low values like
> .00000001 I can get it to generate a similarly costed plan (while still
> performing horridly).
>
> I've reached the end of things I can think about (I'm also working on
> rewriting the query but it's a generated query out of a infrastructure
> component so it's not a simple change).
>
> Just looking for any ideas on additional things to look into.
>
> The query is available here:
> https://www.dropbox.com/s/m31ct6k0mod0576/simplifiedquery.sql?dl=0
>
> --
> Data's inconvienient when people have opinions.

Seems like there is no useable index on table timesheet. It always
seems to table scan all 99K rows, several times.

I'll bet on RDS that table scan is super slow.

The RDS plans seem to be just explain? I assume its too slow to run an
explain analyze on? Would be neat to see explain analyze from RDS.
(any way to add a little extra where magic to cut the rows down to a
useable, but still slow, sample?)

On this one:
http://explain.depesz.com/s/UnQt

Line 11 table scans 99K rows, then all those rows are carried up the
chain (lines 10, 9, 8, 7 and 6). Any way to reduce the row count
earlier? Line 5 finally seems to filter out 94K rows.

Would be neat to see if these are buffered reads or are actually hitting
disk too. (something like EXPLAIN (ANALYZE, BUFFERS) select...)

-Andy

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2015-03-31 18:16:09 Re: inputs into query planner costing
Previous Message Andy Colson 2015-03-31 17:12:22 Re: Partial index-based load balancing