Skip site navigation (1) Skip section navigation (2)

slow plan on join when adding where clause

From: "L(dot) Loewe" <lloewe(at)hotmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: slow plan on join when adding where clause
Date: 2010-03-29 00:19:04
Message-ID: op.vaa1x2utb4gxma@oakcreek2 (view raw or whole thread)
Lists: pgsql-novice
Hi all

I have a fairly large table ("main" ~50M rows) containing a timestamp  
  a sensor_id (indexed) and some sensor data ranging over a few years.

The second table ("events" <500 rows) contains a set of interesting events  
an and two timestamps: events.start and events.stop plus some
additional data.

Now I want to join these tables to get the sensor data for those events.

The interval between start and stop is quite short for each event (usually  
couple of minutes) so that there aren't too many rows from table "main"
matching this criteria (~1K) for each event:

SELECT * FROM events, main WHERE main.ti > events.start and main.ti <  


Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
                  (actual time=0.038..42.314 rows=69209 loops=1)
    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
                 (actual time=0.006..0.025 rows=124 loops=1)
    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16  
rows=4229880 width=40)
                 (actual time=0.016..0.178 rows=558 loops=124)
          Index Cond: ((main.ti > events.start) AND (main.ti < events.stop))
Total runtime: 47.682 ms

So far so good, however if I add a sensor_id constraint the planner  
chooses a
different approach:

SELECT  * FROM  events, main WHERE  main.ti > events.start and
                                     main.ti < events.stop and sensor_id=1;

Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
                 (actual time=23427.599..23886.276 rows=772 loops=1)
    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087  
                 (actual time=3771.719..9508.728 rows=490984 loops=1)
          Recheck Cond: (sensor_id= 1)
          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94  
rows=348087 width=0)
                 (actual time=3769.075..3769.075 rows=491102 loops=1)
                Index Cond: (sensor_id= 1)
    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
                 (actual time=0.000..0.010 rows=124 loops=490984)
          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
                (actual time=0.005..0.021 rows=124 loops=1)
Total runtime: 23886.494 ms

Apparently the planner assumes that sensor_id=1 will return fewer rows than
the time constraint while the opposite is true:
sensor_id=1 -> ~ 500K , time window -> ~ 1K.

Is there a way to hint the planner to use plan 1 even with the sensor_id=1
clause or am I doing something fundamentally wrong here?



pgsql-novice by date

Next:From: Carel CombrinkDate: 2010-03-29 07:21:19
Subject: Return row after a Insert
Previous:From: dipti shahDate: 2010-03-28 09:53:48
Subject: Designing Postgres Security Model

Privacy Policy | About PostgreSQL
Copyright © 1996-2015 The PostgreSQL Global Development Group