Re: Odd planner choice?

From: "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Odd planner choice?
Date: 2004-10-08 20:29:57
Message-ID: 416706D5.13084.2A2108B2@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 8 Oct 2004 at 16:04, Tom Lane wrote:

> "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk> writes:
> > If I remove the redundant clauses, the planner now estimates 1000 rows returned from
> > the table, not unreasonable since it has no statistics. But *why* in that case, with *more*
> > estimated rows does it choose to materialize that table (26 rows) 573416 times!!!
>
> It isn't. It's materializing that once and scanning it 573416 times,
> once for each row in the outer relation. And this is not a bad plan
> given the estimates. If it had stuck to what you call the good plan,
> and there *had* been 1000 rows in the temp table, that plan would have
> run 1000 times longer than it did.
>
> As a general rule, if your complaint is that you get a bad plan for an
> unanalyzed table, the response is going to be "so analyze the table".
>

The problem is in this case is that if I *do* analyse the table I *always* get the bad plan.
Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't
choose the better plan by just reading the 26 rows once and index scan the
booking_plan table 26 times (as in the "good" plan).

OK, with 1000 row estimate I can see that index scanning 1000 times into the
booking_plan table would take some time, but the even if planner estimates 5 rows it still
produces the same slow query.

If I analyze the table it then knows there are 26 rows and therefore always goes slow.

This is why I am not analyzing this table, to fool the planner into thinking there is only
one row and produce a much faster access plan. Not ideal I know.

Just using one redundant clause I now get:

select distinct b.staff_id from staff_booking b, booking_plan bp, t_search_reqt_dates rd
where b.booking_id = bp.booking_id
and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from
AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
and rd.search_id = 13

QUERY PLAN
Unique (cost=50885.97..50921.37 rows=462 width=4) (actual
time=35231.000..35241.000 rows=110 loops=1)
-> Sort (cost=50885.97..50903.67 rows=35397 width=4) (actual
time=35231.000..35241.000 rows=2173 loops=1)
Sort Key: b.staff_id
-> Hash Join (cost=44951.32..50351.07 rows=35397 width=4) (actual
time=34530.000..35231.000 rows=2173 loops=1)
Hash Cond: ("outer".booking_id = "inner".booking_id)
-> Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8)
(actual time=0.000..351.000 rows=197877 loops=1)
-> Hash (cost=44933.62..44933.62 rows=35397 width=4) (actual
time=34530.000..34530.000 rows=0 loops=1)
-> Nested Loop (cost=15.50..44933.62 rows=35397 width=4) (actual
time=8342.000..34520.000 rows=2173 loops=1)
Join Filter: (("inner".datetime_from <= "outer".datetime_to) AND
("inner".datetime_to >= "outer".datetime_from) AND ("outer".booking_date >=
("inner".reqt_date - 1)) AND ("outer".booking_date <= ("inner".reqt_date + 1)))
-> Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416
width=24) (actual time=0.000..1053.000 rows=573416 loops=1)
-> Materialize (cost=15.50..15.53 rows=5 width=20) (actual
time=0.001..0.019 rows=26 loops=573416)
-> Seq Scan on t_search_reqt_dates rd (cost=0.00..15.50 rows=5
width=20) (actual time=0.000..0.000 rows=26 loops=1)
Filter: (search_id = 13)
Total runtime: 35241.000 ms

If this is the only answer for now, then fair enough I will just have to do more testing.

Regards,
Gary.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-10-08 21:43:16 First set of OSDL Shared Mem scalability results, some wierdness ...
Previous Message Tom Lane 2004-10-08 20:04:42 Re: Odd planner choice?