Skip site navigation (1) Skip section navigation (2)

Re: Querying 19million records very slowly

From: Paul Ramsey <pramsey(at)refractions(dot)net>
To: Kjell Tore Fossbakk <kjelltore(at)gmail(dot)com>
Cc: PFC <lists(at)boutiquenumerique(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Querying 19million records very slowly
Date: 2005-06-22 14:20:13
Message-ID: BF4268C3-E328-11D9-B899-000393D33C2E@refractions.net (view raw or flat)
Thread:
Lists: pgsql-performance
OK, so the planner is in fact making a mistake (I think).  Try turning  
down your random_page_cost a little. It defaults at 4.0, see if 2.0  
works "right". (Careful, move these things around too much for one  
query, you will wreck others.)  4.0 is a little large for almost all  
modern hardware, so see if moving it down a little makes things  
somewhat smarter.

P

On Wednesday, June 22, 2005, at 12:45 AM, Kjell Tore Fossbakk wrote:

> database=> set enable_seqscan to on;
> SET
> Time: 0.34 ms
>
>
>
> database=> explain analyze select count(*) from test where p1=53;
>                                                       QUERY PLAN
> ----------------------------------------------------------------------- 
> ------------------------------------------------
>  Aggregate  (cost=522824.50..522824.50 rows=1 width=0) (actual
> time=56380.72..56380.72 rows=1 loops=1)
>    ->  Seq Scan on test  (cost=0.00..517383.30 rows=2176479 width=0)
> (actual time=9.61..47677.48 rows=2220746 loops=1)
>          Filter: (p1 = 53)
>  Total runtime: 56380.79 msec
> (4 rows)
>
> Time: 56381.40 ms
>
>
>
> database=> explain analyze select count(*) from test where p1=53 and
> time > now() - interval '24 hours' ;
>                                                        QUERY PLAN
> ----------------------------------------------------------------------- 
> -------------------------------------------------
>  Aggregate  (cost=661969.01..661969.01 rows=1 width=0) (actual
> time=45787.02..45787.02 rows=1 loops=1)
>    ->  Seq Scan on test  (cost=0.00..660155.28 rows=725493 width=0)
> (actual time=37799.32..45613.58 rows=42424 loops=1)
>          Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval)))
>  Total runtime: 45787.09 msec
> (4 rows)
>
> Time: 45787.79 ms
>
>
>
> database=> explain analyze select date_trunc('hour', time),count(*) as
> total from test where p1=53 and time>now()-interval '24 hours' group
> by date_trunc order by date_trunc;
>                                                              QUERY
> PLAN
> ----------------------------------------------------------------------- 
> -------------------------------------------------------------
>  Aggregate  (cost=755116.97..760558.17 rows=72549 width=8) (actual
> time=46040.63..46717.61 rows=23 loops=1)
>    ->  Group  (cost=755116.97..758744.44 rows=725493 width=8) (actual
> time=46022.06..46548.84 rows=42407 loops=1)
>          ->  Sort  (cost=755116.97..756930.70 rows=725493 width=8)
> (actual time=46022.04..46198.94 rows=42407 loops=1)
>                Sort Key: date_trunc('hour'::text, "time")
>                ->  Seq Scan on test  (cost=0.00..660155.28 rows=725493
> width=8) (actual time=37784.91..45690.88 rows=42407 loops=1)
>                      Filter: ((p1 = 53) AND ("time" > (now() - '1
> day'::interval)))
>  Total runtime: 46718.43 msec
> (7 rows)
>
> Time: 46719.44 ms
>
>
>
> database=> create index test_time_p1_idx on test(time,p1) ;
> CREATE INDEX
> Time: 178926.02 ms
>
> database=> vacuum analyze test ;
> VACUUM
> Time: 73058.33 ms
>
> database=> \d test
>                Table "public.test"
>   Column     |           Type           | Modifiers
> -------------+--------------------------+-----------
>  time        | timestamp with time zone |
>  source      | inet                     |
>  destination | inet                     |
>  p1          | integer                  |
>  p2          | integer                  |
>
>
>
> database=> \di
>  public | test_time_idx        | index | database | test
>  public | test_source_idx      | index | database | test
>  public | test_destination_idx | index | database | test
>  public | test_p1_idx          | index | database | test
>  public | test_p2_idx          | index | database | test
>  public | test_time_p1_idx     | index | database | test
>
>
>
> database=> set enable_seqscan to off ;
> SET
> Time: 0.28 ms
>
>
>
> database=> explain analyze select date_trunc('hour', time),count(*) as
> total from test where p1=53 and time>now()-interval '24 hours' group
> by date_trunc order by date_trunc;
>  Aggregate  (cost=2315252.66..2320767.17 rows=73527 width=8) (actual
> time=2081.15..2720.44 rows=23 loops=1)
>    ->  Group  (cost=2315252.66..2318929.00 rows=735268 width=8)
> (actual time=2079.76..2564.22 rows=41366 loops=1)
>          ->  Sort  (cost=2315252.66..2317090.83 rows=735268 width=8)
> (actual time=2079.74..2243.32 rows=41366 loops=1)
>                Sort Key: date_trunc('hour'::text, "time")
>                ->  Index Scan using test_time_p1_idx on test
> (cost=0.00..2218878.46 rows=735268 width=8) (actual
> time=29.50..1774.52 rows=41366 loops=1)
>                      Index Cond: (("time" > (now() - '1
> day'::interval)) AND (p1 = 53))
>  Total runtime: 2735.42 msec
>
> Time: 2736.48 ms
>
>
>
> database=> explain analyze select date_trunc('hour', time),count(*) as
> total from test where p1=80 and time>now()-interval '24 hours' group
> by date_trunc order by date_trunc;
>  Aggregate  (cost=1071732.15..1074305.59 rows=34313 width=8) (actual
> time=6353.93..7321.99 rows=22 loops=1)
>    ->  Group  (cost=1071732.15..1073447.77 rows=343125 width=8)
> (actual time=6323.76..7078.10 rows=64267 loops=1)
>          ->  Sort  (cost=1071732.15..1072589.96 rows=343125 width=8)
> (actual time=6323.75..6579.42 rows=64267 loops=1)
>                Sort Key: date_trunc('hour'::text, "time")
>                ->  Index Scan using test_time_p1_idx on test
> (cost=0.00..1035479.58 rows=343125 width=8) (actual time=0.20..5858.67
> rows=64267 loops=1)
>                      Index Cond: (("time" > (now() - '1
> day'::interval)) AND (p1 = 80))
>  Total runtime: 7322.82 msec
>
> Time: 7323.90 ms
>
>
>
> database=> explain analyze select date_trunc('hour', time),count(*) as
> total from test where p1=139 and time>now()-interval '24 hours' group
> by date_trunc order by date_trunc;
>  Aggregate  (cost=701562.34..703250.12 rows=22504 width=8) (actual
> time=2448.41..3033.80 rows=22 loops=1)
>    ->  Group  (cost=701562.34..702687.53 rows=225037 width=8) (actual
> time=2417.39..2884.25 rows=36637 loops=1)
>          ->  Sort  (cost=701562.34..702124.94 rows=225037 width=8)
> (actual time=2417.38..2574.19 rows=36637 loops=1)
>                Sort Key: date_trunc('hour'::text, "time")
>                ->  Index Scan using test_time_p1_idx on test
> (cost=0.00..679115.34 rows=225037 width=8) (actual time=8.47..2156.18
> rows=36637 loops=1)
>                      Index Cond: (("time" > (now() - '1
> day'::interval)) AND (p1 = 139))
>  Total runtime: 3034.57 msec
>
> Time: 3035.70 ms
>
>
>
> Now, this query gives me all the hours in a day, with the count of all
> p1=53 for each hour. Pg uses 46.7 seconds to run with seqscan, while
> 2.7 seconds indexing on (time,p1). I think I turned "set
> enable_seqscan to on;" again, and then the planner used seqscan, and
> not index.
> - Why does Pg not see the benefits of using index?
> - and how can i tune the optimisation fields in postgresql.conf to  
> help him?
>
> So now my PG uses a reasonable amout of time on these queries (with
> enable_seqscan turned off)
>
> The next place which seems to slow my queries, is probably my
> connection to PHP. I got a bash script running in cron on my server
> (freebsd 4.11), which runs php on a php file. To force PG to not use
> seqscans, I have modifies the postgresql.conf:
>
> ..
> enable_seqscan = false
> enable_indexscan = true
> ..
> effective_cache_size = 10000
> random_page_cost = 2
> ..
>
> I save the file, type 'pg_crl reload' then enter 'psql database'.
>
> database=> show enable_seqscan ;
>  enable_seqscan
> ----------------
>  on
> (1 row)
>
>
> argus=> show effective_cache_size ;
>  effective_cache_size
> ----------------------
>  1000
> (1 row)
>
> I have used the manual pages on postgresql, postmaster, and so on, but
> I cant find anywhere to specify which config file Pg is to use. I'm
> not entirely sure if he uses the one im editing
> (/usr/local/etc/postgresql.conf).
>
> Any hints, tips or help is most appreciated!
>
> Kjell Tore.
>
>
>
>
>
> On 6/21/05, PFC <lists(at)boutiquenumerique(dot)com> wrote:
>>
>> use CURRENT_TIME which is a constant instead of now() which is not
>> considered constant... (I think)
>>
>>
      Paul Ramsey
      Refractions Research
      Email: pramsey(at)refractions(dot)net
      Phone: (250) 885-0632


In response to

Responses

pgsql-performance by date

Next:From: Kjell Tore FossbakkDate: 2005-06-22 14:41:54
Subject: Re: Querying 19million records very slowly
Previous:From: Tom LaneDate: 2005-06-22 13:50:39
Subject: Re: Querying 19million records very slowly

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group