Overriding the optimizer

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Overriding the optimizer
Date: 2005-12-15 23:06:03
Message-ID: 43A1F6DB.3080805@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I asked a while back if there were any plans to allow developers to override the optimizer's plan and force certain plans, and received a fairly resounding "No". The general feeling I get is that a lot of work has gone into the optimizer, and by God we're going to use it!

I think this is just wrong, and I'm curious whether I'm alone in this opinion.

Over and over, I see questions posted to this mailing list about execution plans that don't work out well. Many times there are good answers - add an index, refactor the design, etc. - that yield good results. But, all too often the answer comes down to something like this recent one:

> Right on. Some of these "coerced" plans may perform
> much better. If so, we can look at tweaking your runtime
> config: e.g.
>
> effective_cache_size
> random_page_cost
> default_statistics_target
>
> to see if said plans can be chosen "naturally".

I see this over and over. Tweak the parameters to "force" a certain plan, because there's no formal way for a developer to say, "I know the best plan."

There isn't a database in the world that is as smart as a developer, or that can have insight into things that only a developer can possibly know. Here's a real-life example that caused me major headaches. It's a trivial query, but Postgres totally blows it:

select * from my_table
where row_num >= 50000 and row_num < 100000
and myfunc(foo, bar);

How can Postgres possibly know what "myfunc()" does? In this example, my_table is about 10 million rows and row_num is indexed. When the row_num range is less than about 30,000, Postgres (correctly) uses an row_num index scan, then filters by myfunc(). But beyond that, it chooses a sequential scan, filtering by myfunc(). This is just wrong. Postgres can't possibly know that myfunc() is VERY expensive. The correct plan would be to switch from index to filtering on row_num. Even if 99% of the database is selected by row_num, it should STILL at least filter by row_num first, and only filter by myfunc() as the very last step.

How can a database with no ability to override a plan possibly cope with this?

Without the explicit ability to override the plan Postgres generates, these problems dominate our development efforts. Postgres does an excellent job optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to get right. We spend huge amounts of time on trial-and-error queries, second guessing Postgress, creating unnecessary temporary tables, sticking in the occasional OFFSET in a subquery to prevent merging layers, and so forth.

This same application also runs on Oracle, and although I've cursed Oracle's stupid planner many times, at least I can force it to do it right if I need to.

The danger of forced plans is that inexperienced developers tend to abuse them. So it goes -- the documentation should be clear that forced plans are always a last resort.

But there's no getting around the fact that Postgres needs a way for a developer to specify the execution plan.

Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Fuhr 2005-12-15 23:44:48 Re: How much expensive are row level statistics?
Previous Message Luke Lonergan 2005-12-15 21:40:14 Re: SAN/NAS options