Re: Slight change in query leads to unexpected change in query plan

From: Jack Orenstein <jack(dot)orenstein(at)hds(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sam Mason <sam(at)samason(dot)me(dot)uk>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Slight change in query leads to unexpected change in query plan
Date: 2009-06-23 14:55:53
Message-ID: 4A40ECF9.5090900@hds.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Jack Orenstein <jack(dot)orenstein(at)hds(dot)com> writes:
>> Limit (cost=0.00..324.99 rows=100 width=451)
>> -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451)
>> Index Cond: (pk > 1000000000)
>
>> Adding the value restriction at the top of this query plan wouldn't increase the
>> cost very much.
>
> You're missing the point: with the value restriction in place, it's
> estimating that it will probably have to scan all 91000 rows (because
> there are fewer than 100 satisfying the value restriction). And that
> is going to cost somewhere north of 296027 cost units --- the cost
> shown, plus 91000 invocations of the value-restriction check.
> Which is considerably more than the cost of the other plan.

I see -- the optimizer is calculating that it will have to examine a very large
fraction of the rows.

Actually, pk and value are highly correlated. If a row gets past the index scan,
odds are very high that the value predicate will evaluate to true. So I am sure
that the index scan is the right way to go. I'm just trying to convince the
optimizer of this.

One thing I am considering is introducing a function with high execution cost.
E.g., if I do this:

create function return_input(bytea) returns bytea
as '
begin
return $1;
end;
' language 'plpgsql' cost 10000;

explain
select *
from t
where pk > 1000000
and return_input(value = 'asdf'::bytea)
order by pk
limit 100;

then I get the plan I want.

Limit (cost=0.00..563490.32 rows=100 width=451)
-> Index Scan using t_pkey on t (cost=0.00..34226402.07 rows=6074 width=451)
Index Cond: (pk > 1000000)
Filter: (return_input(value) = 'asdf'::bytea)

Is there a more elegant way of forcing the plan that I want?

Jack

In response to

Browse pgsql-general by date

  From Date Subject
Next Message sergio nogueira 2009-06-23 14:57:33 PANIC: cannot abort transaction 140578842, it was already committed
Previous Message Chris Spotts 2009-06-23 14:47:33 Re: Explaining functions.