Re: multiple joins + Order by + LIMIT query performance issue

From: Antoine Baudoux <ab(at)taktik(dot)be>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shaun Thomas <sthomas(at)leapfrogonline(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: multiple joins + Order by + LIMIT query performance issue
Date: 2008-05-07 07:23:51
Message-ID: A97B7728-FCF9-4F66-AA64-E3F81B31F9F8@taktik.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
> If a misestimate of this kind is bugging you enough that you're
> willing
> to change the query, I think you can fix it like this:
>
> select ... from foo order by x limit n;
> =>
> select ... from (select ... from foo order by x) ss limit n;
>
> The subselect will be planned without awareness of the LIMIT, so you
> should get a plan using a sort rather than one that bets on the LIMIT
> being reached quickly.

I tried that, using a subquery. Unfortunately this does not change
anything :

select * from (select * from t_Event event
inner join t_Service service on event.service_id=service.id
inner join t_System system on service.system_id=system.id
inner join t_Interface interface on system.id=interface.system_id
where (interface.network_id=1) order by event.c_date desc ) ss limit 25

"Limit (cost=147.79..5563.93 rows=25 width=3672)"
" -> Subquery Scan ss (cost=147.79..2896263.01 rows=13368
width=3672)"
" -> Nested Loop (cost=147.79..2896129.33 rows=13368
width=958)"
" Join Filter: (service.id = event.service_id)"
" -> Index Scan Backward using event_date_idx on t_event
event (cost=0.00..1160633.69 rows=8569619 width=344)"
" -> Materialize (cost=147.79..147.88 rows=9 width=614)"
" -> Hash Join (cost=16.56..147.79 rows=9
width=614)"
" Hash Cond: (service.system_id = system.id)"
" -> Seq Scan on t_service service
(cost=0.00..109.28 rows=5828 width=40)"
" -> Hash (cost=16.55..16.55 rows=1
width=574)"
" -> Nested Loop (cost=0.00..16.55
rows=1 width=574)"
" -> Index Scan using
interface_network_id_idx on t_interface interface (cost=0.00..8.27
rows=1 width=558)"
" Index Cond: (network_id =
1)"
" -> Index Scan using
t_system_pkey on t_system system (cost=0.00..8.27 rows=1 width=16)"
" Index Cond: (system.id =
interface.system_id)"

The worst thing about all this is that there are ZERO rows to join
with the t_event table. So the planner decide to index-scan 8 millions
row, where there is no hope of finding a match!
This seems a very ,very , very poor decision

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Albe Laurenz *EXTERN* 2008-05-07 07:29:05 Re: RAID 10 Benchmark with different I/O schedulers
Previous Message Josh Cole 2008-05-07 05:29:38 Re: pgfouine - commit details?