Re: Query running slow

From: PFC <lists(at)peufeu(dot)com>
To: "samantha mahindrakar" <sam(dot)mahindrakar(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query running slow
Date: 2008-04-16 23:19:38
Message-ID: op.t9p8e0oocigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 16 Apr 2008 23:14:11 +0200, samantha mahindrakar
<sam(dot)mahindrakar(at)gmail(dot)com> wrote:

> Hi....
> Iam finding the following query is working a bit slow:
> EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'',
> measurement_start)
> FROM ' || gettablestring(dates)|| '
> WHERE lane_id IN (' || lanesidarr || ')))'
> INTO temparr;
>
> This function is trying to find all the days in a prticular month
> whihc has data for the particular lane and put it in an array...which
> can be used later.
> gettablestring(dates) returns the partition name from which the data
> needs to be extracted. These partitions have index on the
> measurement_start field.
> lanesidarr is a lane number. The partition has an index on this field to.
> Could anyone give me some hints???/

OK so I guess you have one partition per month since there is no month in
your WHERE.
If this is a table which hasn't got much write activity (probably the
case for last month's partition, for instance), CLUSTER it on something
appropriate that you use often in queries, like lane_id here.
And you can use SELECT foo GROUP BY foo, this will use a hash, it is
faster than a sort.
Example :

CREATE TABLE blop AS SELECT '2008-01-01'::TIMESTAMP + ((n%30)*'1
DAY'::INTERVAL) AS t FROM generate_series(1,100000) AS n;
ALTER TABLE blop ADD d DATE NULL;
UPDATE blop SET d=t;
VACUUM FULL ANALYZE blop;

-- Now blop contains 100K timestamps and 100K dates from the month 2008-01

EXPLAIN ANALYZE SELECT DISTINCT EXTRACT( DAY from t ) FROM blop;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Unique (cost=10051.82..10551.82 rows=30 width=8) (actual
time=221.740..289.801 rows=30 loops=1)
-> Sort (cost=10051.82..10301.82 rows=100000 width=8) (actual
time=221.737..250.911 rows=100000 loops=1)
Sort Key: (date_part('day'::text, t))
Sort Method: quicksort Memory: 5955kB
-> Seq Scan on blop (cost=0.00..1747.00 rows=100000 width=8)
(actual time=0.021..115.254 rows=100000 loops=1)
Total runtime: 290.237 ms
(6 lignes)

Temps : 290,768 ms

EXPLAIN ANALYZE SELECT EXTRACT( DAY from t ) AS day FROM blop GROUP BY day;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1997.00..1997.38 rows=30 width=8) (actual
time=198.375..198.390 rows=30 loops=1)
-> Seq Scan on blop (cost=0.00..1747.00 rows=100000 width=8) (actual
time=0.021..129.779 rows=100000 loops=1)
Total runtime: 198.437 ms
(3 lignes)

Temps : 198,894 ms

==> Hash is faster than Sort

EXPLAIN ANALYZE SELECT d FROM blop GROUP BY d;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1747.00..1747.30 rows=30 width=4) (actual
time=101.829..101.842 rows=30 loops=1)
-> Seq Scan on blop (cost=0.00..1497.00 rows=100000 width=4) (actual
time=0.012..33.428 rows=100000 loops=1)
Total runtime: 101.905 ms
(3 lignes)

Temps : 102,516 ms

==> Not computing the EXTRACT is faster obviously

(actually EXPLAIN ANALYZE adds some overhead, the query really takes 60 ms)

If you have an index lane_id, measurement_date, you can always do :

for day in 1..31:
find 1 row with which has this day
reutrn the days you found

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2008-04-17 03:19:47 Re: Oddly slow queries
Previous Message Craig James 2008-04-16 22:47:08 Re: Anybody using the Dell Powervault MD3000 array?