Re: yet another q

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: yet another q
Date: 2010-08-19 06:51:47
Message-ID: AANLkTik5ugyeaGBJ5pFKQ3AhcQpU=dheo7_jkXVXj+5S@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The full set of conf changes that were in use during these tests are as
follows:

default_statistics_target = 100 # pgtune wizard 2010-08-17
maintenance_work_mem = 1GB # pgtune wizard 2010-08-17
constraint_exclusion = on # pgtune wizard 2010-08-17
checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17
effective_cache_size = 36GB # sam
work_mem = 1500MB # pgtune wizard 2010-08-17
wal_buffers = 8MB # pgtune wizard 2010-08-17
#checkpoint_segments = 16 # pgtune wizard 2010-08-17
checkpoint_segments = 30 # sam
shared_buffers = 8GB # pgtune wizard 2010-08-17
max_connections = 80 # pgtune wizard 2010-08-17
cpu_tuple_cost = 0.0030 # sam
cpu_index_tuple_cost = 0.0010 # sam
cpu_operator_cost = 0.0005 # sam
random_page_cost = 2.0 # sam

On Wed, Aug 18, 2010 at 11:50 PM, Samuel Gendler
<sgendler(at)ideasculptor(dot)com>wrote:

> On Wed, Aug 18, 2010 at 11:14 PM, Samuel Gendler <
> sgendler(at)ideasculptor(dot)com> wrote:
>
>> Please forgive the barrage of questions. I'm just learning how to tune
>> things in postgres and I've still got a bit of learning curve to get over,
>> apparently. I have done a lot of reading, though, I swear.
>>
>> I've got two identical queries except for a change of one condition which
>> cuts the number of rows in half - which also has the effect of eliminating
>> one partition from consideration (partitioned over time and I cut the time
>> period in half). The query plans are considerably different as a result.
>> The net result is that the fast query is 5x faster than the slow query. I'm
>> curious if the alternate query plan is inherently faster or is it just a
>> case of the algorithm scaling worse than linearly with the row count, which
>> certainly wouldn't be surprising. The big win, for me, is that the sort
>> uses vastly less memory. The slow plan requires work_mem to be 1500MB to
>> even make it 5x worse. With a more reasonable work_mem (400MB), it drops to
>> something like 15x worse because it has to sort on disk.
>>
>> fast plan: http://explain.depesz.com/s/iZ
>> slow plan: http://explain.depesz.com/s/Dv2
>>
>> query:
>>
>>
>> EXPLAIN ANALYZE SELECT
>> t_lookup.display_name as group,
>> to_char(t_fact.time, 'DD/MM HH24:MI') as category,
>> substring(t_lookup.display_name from 1 for 20) as label,
>> round(sum(t_fact.total_ms)/sum(t_fact.count)) as value
>> FROM
>> portal.providers t_lookup,
>> day_scale_radar_performance_fact t_fact
>> WHERE
>> t_fact.probe_type_num < 3
>> and t_lookup.provider_id = t_fact.provider_id
>> and t_lookup.provider_owner_customer_id =
>> t_fact.provider_owner_customer_id
>> and t_fact.provider_owner_customer_id = 0
>> and t_fact.time between timezone('UTC', '2010-08-18 15:00:00')
>> - interval '30 day' and timezone('UTC', '2010-08-18 15:00:00')
>> GROUP BY
>> t_fact.provider_owner_customer_id, t_fact.provider_id,
>> t_lookup.display_name,
>> t_fact.time
>> ORDER BY
>> t_fact.time
>>
>> table structure:
>>
>> Table "perf_reporting.abstract_radar_performance_fact"
>> Column | Type | Modifiers
>> ----------------------------+-----------------------------+-----------
>> count | bigint | not null
>> total_ms | bigint | not null
>> time | timestamp without time zone | not null
>> market_num | integer | not null
>> country_num | integer | not null
>> autosys_num | integer | not null
>> provider_owner_zone_id | integer | not null
>> provider_owner_customer_id | integer | not null
>> provider_id | integer | not null
>> probe_type_num | integer | not null
>>
>> with individual indexes on the everything from time to the bottom on the
>> child tables
>>
>> and
>>
>>
>> Table "portal.providers"
>> Column | Type | Modifiers
>>
>>
>> ----------------------------+-----------------------------+------------------------
>> btime | timestamp without time zone | not null
>> default now()
>> mtime | timestamp without time zone | not null
>> default now()
>> version | integer | not null
>> default 1
>> provider_id | integer | not null
>> provider_owner_zone_id | integer | not null
>> provider_owner_customer_id | integer | not null
>> provider_category_id | integer | not null
>> name | character varying(255) | not null
>> display_name | character varying(255) | not null
>>
>> with indexes on every column with name ending in '_id'
>>
>>
> It gets more complicated:
>
> When I dropped to a query over 15 days instead of 30 days, I saw a huge
> bump in performance (about 16 secs), the query plan for which is here:
>
> http://explain.depesz.com/s/iaf
>
> note: the query is identical to the one below, but with the interval
> changed to 15 days from 30 days, which also keeps the query within a single
> partition. Note that the sort requires almost no memory and occurs after
> the aggregation. I thought my problems were solved, since reducing the
> normal window over which queries are performed is something the app can
> tolerate.
>
> However, if I keep the same 15 day window (so row count is approximately
> the same), but change the time window start date by 2 days (still keeping
> the entire query within the same partition), I get a completely different
> query plan. There is effectively no difference between the two queries
> other than the start date of the time window in the where clause, but one
> executes in twice the time (35 secs or thereabouts).
>
> http://explain.depesz.com/s/LA
>
> Just for completeness' sake, I changed the query such that it is still 15
> days, but this time crosses a partition boundary. The plan is very similar
> to the previous one and executes in about the same time (35 secs or so)
>
> http://explain.depesz.com/s/Aqw
>
> Statistics are up to date and were performed with default_statistics_target
> = 100
>
> Is there any way I can force the more efficient HashAggregate then sort
> plan instead of sort then GroupAggregate?
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2010-08-19 06:52:16 Re: in-memory sorting
Previous Message Samuel Gendler 2010-08-19 06:50:29 Re: yet another q