Slow query on join with Date >=

From: Jim Treinen <jtreinen(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow query on join with Date >=
Date: 2014-01-28 22:01:32
Message-ID: CAGtdQrkLhpk7hJ0P-i9m3A5hHVXgZdYSC4LeqSCxPJzhnPuUmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a performance problem using a dimensional model where the date is
specified in a DATE dimension, specifically when using 'WHERE DATE >= 'Some
Date'

This query runs very fast when using an equality expression, eg. 'WHERE
DATE = '2014-01-01", and I'm wondering if there is a way to make it run
fast when using the greater than expression.

The dimension table is about 5k rows, and the Fact table is ~60M.

Thanks in advance for any advice.

JT.

The query :

select sid, count(*) from fact fact_data fact left outer join dim_date dim
on dim.date_id = fact.date_id where dim.date >= '2014-1-25' group by sid
order by count desc limit 10;

FACT Table Definition:

Table "public.fact_data"
Column | Type | Modifiers
---------------+-----------------------------+-----------
date_id | integer |
date | timestamp without time zone |
agent_id | integer |
instance_id | integer |
sid | integer |
Indexes:
"fact_agent_id" btree (agent_id)
"fact_date_id" btree (date_id) CLUSTER
"fact_alarms_sid" btree (sid)

Table "public.dim_date"
Column | Type | Modifiers

--------------------+---------+------------------------------------------------------------
date_id | integer | not null default
nextval('dim_date_date_id_seq'::regclass)
date | date |
year | integer |
month | integer |
month_name | text |
day | integer |
day_of_year | integer |
weekday_name | text |
calendar_week | integer |
quarter | text |
year_quarter | text |
year_month | text |
year_calendar_week | text |
weekend | text |
week_start_date | date |
week_end_date | date |
month_start_date | date |
month_end_date | date |
Indexes:
"dim_date_date" btree (date)
"dim_date_date_id" btree (date_id)

EXPLAIN Output:

explain (analyze, buffers) select dim.date_id, fact.sid, count(1) from
fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id
where dim.date_id >= 5139 group by 1,2 order by 3 desc limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9772000.55..9772000.58 rows=10 width=8) (actual
time=91064.421..91064.440 rows=10 loops=1)
Buffers: shared hit=4042 read=1542501
-> Sort (cost=9772000.55..9787454.06 rows=6181404 width=8) (actual
time=91064.408..91064.414 rows=10 loops=1)
Sort Key: (count(1))
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=4042 read=1542501
-> GroupAggregate (cost=9150031.23..9638422.63 rows=6181404
width=8) (actual time=90892.625..91063.905 rows=617 loops=1)
Buffers: shared hit=4042 read=1542501
-> Sort (cost=9150031.23..9256675.57 rows=42657736
width=8) (actual time=90877.129..90964.995 rows=124965 loops=1)
Sort Key: dim.date_id, fact.sid
Sort Method: quicksort Memory: 8930kB
Buffers: shared hit=4042 read=1542501
-> Hash Join (cost=682.34..3160739.50 rows=42657736
width=8) (actual time=45087.394..90761.624 rows=124965 loops=1)
Hash Cond: (fact.date_id = dim.date_id)
Buffers: shared hit=4042 read=1542501
-> Seq Scan on fact_data fact
(cost=0.00..2139866.40 rows=59361340 width=8) (actual
time=0.090..47001.500 rows=59360952 loops=1)
Buffers: shared hit=3752 read=1542501
-> Hash (cost=518.29..518.29 rows=13124
width=4) (actual time=21.083..21.083 rows=13125 loops=1)
Buckets: 2048 Batches: 1 Memory Usage:
462kB
Buffers: shared hit=290
-> Seq Scan on dim_date dim
(cost=0.00..518.29 rows=13124 width=4) (actual time=0.494..10.918
rows=13125 loops=1)
Filter: (date_id >= 5139)
Rows Removed by Filter: 5138
Buffers: shared hit=290
Total runtime: 91064.496 ms
(25 rows)

Browse pgsql-performance by date

  From Date Subject
Next Message Peter Blair 2014-01-29 19:12:33 Re: Select hangs and there are lots of files in table and index directories.
Previous Message Peter Blair 2014-01-28 14:47:48 Re: Select hangs and there are lots of files in table and index directories.