Skip site navigation (1) Skip section navigation (2)

Re: slow plan on join when adding where clause

From: syan tan <kittylitter(at)people(dot)net(dot)au>
To:
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: slow plan on join when adding where clause
Date: 2010-04-07 00:10:42
Message-ID: 1270599042.3241.11.camel@syan-desktop (view raw or flat)
Thread:
Lists: pgsql-novice
Why doesn't the planner do a pruning lower cost ratio threshold
alternative join implementation search ? There's only seq scan, and
3 types of index scans ( hash, btree, and bitmap) , so how hard would
it be for a planner to look at the alternatives e.g. 3 joins , 
4^3  choices of joins , compare 64 choices of plans for less complicated
queries ? 
Also, has anyone ever tried to put some sort of functional
dependencies / normalization tool in contrib of postgresql, or is it
enough that postgresql encourages writing hacked up queries as well as 
anomaly fixing triggers capability , making it feature complete for the
market ? 
On Thu, 2010-04-01 at 22:11 -0600, L. Loewe wrote:
> Thanks.
> 
> That did change the plan a bit but it's still not doing it the way
> it does it when searching all sensors - and it's still a lot slower.
> 
> 
>   Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
>               (actual time=30983.301..31827.299 rows=772 loops=1)
>     Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>     ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
>               (actual time=54.095..18136.153 rows=492150 loops=1)
>           Filter: ((sensor_id + 0) = 1)
>     ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>               (actual time=0.000..0.009 rows=124 loops=492150)
>           ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
> 		     (actual time=0.010..0.021 rows=124 loops=1)
>   Total runtime: 31827.435 ms
> 
> I'm not sure the index is the problem - seems more like the way it
> assembles the data.
> If I read this right the planner takes the rows matching sensor_id=1
> first and then joins the result with the time slices of events.
> 
> This'd be probably a good idea if events had a lot of rows or the time
> slices were large.
> But with the data at hand it is a lot faster to take the rows
> of events first and then match each one to main.ti (which is what
> the planner does without the where clause).
> 
> It's understandable that the planner cannot properly figure out
> how many rows these start-stop slices will select, however since
> it appears to make an assumption I was looking for a way to
> tell it which value to assume...
> 
> Regards
> 
> 
> 
> 
> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala(at)vmsinfo(dot)com>  
> wrote:
> 
> > There is an old trick which can help you here. Try doing this:
> > SELECT  * FROM  events, main WHERE  main.ti > events.start and
> >                                     main.ti < events.stop and  
> > sensor_id+0=1;
> >
> > That will stop planner from merging two indexes and using bitmap.
> >
> > L. Loewe wrote:
> >> Hi all
> >>
> >>
> >> I have a fairly large table ("main" ~50M rows) containing a timestamp  
> >> (indexed),
> >>  a sensor_id (indexed) and some sensor data ranging over a few years.
> >>
> >> The second table ("events" <500 rows) contains a set of interesting  
> >> events with
> >> an events.id and two timestamps: events.start and events.stop plus some
> >> additional data.
> >>
> >> Now I want to join these tables to get the sensor data for those events.
> >>
> >> The interval between start and stop is quite short for each event  
> >> (usually a
> >> couple of minutes) so that there aren't too many rows from table "main"
> >> matching this criteria (~1K) for each event:
> >>
> >> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <  
> >> events.stop;
> >>
> >> EXPLAIN ANALYZE gives
> >>
> >> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
> >>                  (actual time=0.038..42.314 rows=69209 loops=1)
> >>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
> >>                 (actual time=0.006..0.025 rows=124 loops=1)
> >>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16  
> >> rows=4229880 width=40)
> >>                 (actual time=0.016..0.178 rows=558 loops=124)
> >>          Index Cond: ((main.ti > events.start) AND (main.ti <  
> >> events.stop))
> >> Total runtime: 47.682 ms
> >>
> >> So far so good, however if I add a sensor_id constraint the planner  
> >> chooses a
> >> different approach:
> >>
> >>
> >> SELECT  * FROM  events, main WHERE  main.ti > events.start and
> >>                                     main.ti < events.stop and  
> >> sensor_id=1;
> >>
> >> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
> >>                 (actual time=23427.599..23886.276 rows=772 loops=1)
> >>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
> >>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087  
> >> width=40)
> >>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
> >>          Recheck Cond: (sensor_id= 1)
> >>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94  
> >> rows=348087 width=0)
> >>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
> >>                Index Cond: (sensor_id= 1)
> >>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
> >>                 (actual time=0.000..0.010 rows=124 loops=490984)
> >>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
> >>                (actual time=0.005..0.021 rows=124 loops=1)
> >> Total runtime: 23886.494 ms
> >>
> >>
> >> Apparently the planner assumes that sensor_id=1 will return fewer rows  
> >> than
> >> the time constraint while the opposite is true:
> >> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
> >>
> >> Is there a way to hint the planner to use plan 1 even with the  
> >> sensor_id=1
> >> clause or am I doing something fundamentally wrong here?
> >>
> >> Thanks
> >>
> >
> 



In response to

pgsql-novice by date

Next:From: venkatrao.bDate: 2010-04-07 05:30:42
Subject: How to find object dependencies in PG 8.4
Previous:From: Jaime CasanovaDate: 2010-04-06 20:31:24
Subject: Re: TRANSACTION FOR AN UPDATE COMMAND WITH ONE TABLE

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group