Re: slow plan on join when adding where clause

From: "L(dot) Loewe" <lloewe(at)hotmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: slow plan on join when adding where clause
Date: 2010-04-02 04:11:22
Message-ID: op.vairc8l7b4gxma@oakcreek2
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Thanks.

That did change the plan a bit but it's still not doing it the way
it does it when searching all sensors - and it's still a lot slower.

Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68)
(actual time=30983.301..31827.299 rows=772 loops=1)
Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
-> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40)
(actual time=54.095..18136.153 rows=492150 loops=1)
Filter: ((sensor_id + 0) = 1)
-> Materialize (cost=2.36..3.60 rows=124 width=28)
(actual time=0.000..0.009 rows=124 loops=492150)
-> Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
(actual time=0.010..0.021 rows=124 loops=1)
Total runtime: 31827.435 ms

I'm not sure the index is the problem - seems more like the way it
assembles the data.
If I read this right the planner takes the rows matching sensor_id=1
first and then joins the result with the time slices of events.

This'd be probably a good idea if events had a lot of rows or the time
slices were large.
But with the data at hand it is a lot faster to take the rows
of events first and then match each one to main.ti (which is what
the planner does without the where clause).

It's understandable that the planner cannot properly figure out
how many rows these start-stop slices will select, however since
it appears to make an assumption I was looking for a way to
tell it which value to assume...

Regards

On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala(at)vmsinfo(dot)com>
wrote:

> There is an old trick which can help you here. Try doing this:
> SELECT * FROM events, main WHERE main.ti > events.start and
> main.ti < events.stop and
> sensor_id+0=1;
>
> That will stop planner from merging two indexes and using bitmap.
>
> L. Loewe wrote:
>> Hi all
>>
>>
>> I have a fairly large table ("main" ~50M rows) containing a timestamp
>> (indexed),
>> a sensor_id (indexed) and some sensor data ranging over a few years.
>>
>> The second table ("events" <500 rows) contains a set of interesting
>> events with
>> an events.id and two timestamps: events.start and events.stop plus some
>> additional data.
>>
>> Now I want to join these tables to get the sensor data for those events.
>>
>> The interval between start and stop is quite short for each event
>> (usually a
>> couple of minutes) so that there aren't too many rows from table "main"
>> matching this criteria (~1K) for each event:
>>
>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
>> events.stop;
>>
>> EXPLAIN ANALYZE gives
>>
>> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68)
>> (actual time=0.038..42.314 rows=69209 loops=1)
>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>> (actual time=0.006..0.025 rows=124 loops=1)
>> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16
>> rows=4229880 width=40)
>> (actual time=0.016..0.178 rows=558 loops=124)
>> Index Cond: ((main.ti > events.start) AND (main.ti <
>> events.stop))
>> Total runtime: 47.682 ms
>>
>> So far so good, however if I add a sensor_id constraint the planner
>> chooses a
>> different approach:
>>
>>
>> SELECT * FROM events, main WHERE main.ti > events.start and
>> main.ti < events.stop and
>> sensor_id=1;
>>
>> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68)
>> (actual time=23427.599..23886.276 rows=772 loops=1)
>> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
>> width=40)
>> (actual time=3771.719..9508.728 rows=490984 loops=1)
>> Recheck Cond: (sensor_id= 1)
>> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
>> rows=348087 width=0)
>> (actual time=3769.075..3769.075 rows=491102 loops=1)
>> Index Cond: (sensor_id= 1)
>> -> Materialize (cost=2.36..3.60 rows=124 width=28)
>> (actual time=0.000..0.010 rows=124 loops=490984)
>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>> (actual time=0.005..0.021 rows=124 loops=1)
>> Total runtime: 23886.494 ms
>>
>>
>> Apparently the planner assumes that sensor_id=1 will return fewer rows
>> than
>> the time constraint while the opposite is true:
>> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>>
>> Is there a way to hint the planner to use plan 1 even with the
>> sensor_id=1
>> clause or am I doing something fundamentally wrong here?
>>
>> Thanks
>>
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message EvilJonny 2010-04-02 17:14:56 Read only postgres server
Previous Message Mladen Gogala 2010-04-01 18:19:09 Re: slow plan on join when adding where clause