Re: About method of PostgreSQL's Optimizer

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: pryscila(dot)lista(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About method of PostgreSQL's Optimizer
Date: 2005-09-14 16:36:37
Message-ID: 36e6829205091409361035d32a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Pryscila,

Step 2 is basically where you find the difference between a cost-based
optimizer (CBO) and a rule-based optimizer (RBO). A CBO is based on the
computed execution cost of the query whereas an RBO uses more generalized
heuristics.

Let's get an example of what you're proposing and see if we can work it out
from there.

Say we have the following (this is a generalized CBO approach, not
PostgreSQL specific):

Oracle's SCOTT.EMP table with cardinality of 1 million and an index on empno
and ename. For storage purposes say that the empno index takes up 3600
blocks, the ename index takes up 7800 blocks, and the table itself takes up
17000 blocks. We'll also say that we have a 256 megabyte buffer cache of
which we have cached 50% of the empno index, 10% of the ename index, and 5%
of the emp table data.

A user then issues the following query:

SELECT empno, ename FROM emp;

A cost-based optimizer will see the following:
1. See that the query is a full table scan (FTS) and calculate the cost of
retrieving all 17000 blocks from disk.
2. See that the query is a FTS and that it can retrieve all data from the
indexes (11400 blocks) and join the data (which join algorithm?)

Without performing a breadth-first algorithm, how can one evaluate both
options in a way that would allow you to perform heuristic transformations
dynamically? What transformation/heuristic/rule can you use? A CBO
implementation has to calculate the amount of I/O needed on each plan based
on several statistics such as what's *potentially* in the cache, what's the
access time for block I/O (including prefetching if the storage manager has
it), and other factors. If you could name a database that uses a best-first
algorithm, such as A*, please send me the link to their docs; I'd be
interested in reading the implementation.

As for using both in the same optimizer, I could only see an algorithm such
as a customized-A* being used to planning *some* large queries. The reason I
say this is because the cost calculation, which would still need to be
breadth-first, could calculate and cache the cost of most nodes thereby
allowing you to possibly perform transformations at the tail of calculation.

As for references to query optimization possibly using best-first
algorithms, I think I saw several types of algorithms used in work from a
university query optimization engine. I can't remember if it was Cornell,
Stanford, or Wisconsin... I'll try and get you a link to their info.

-Jonah

On 9/14/05, Pryscila B Guttoski <pryscila(dot)lista(at)gmail(dot)com> wrote:
>
> Hi Jonah,
>
> Thank's for your email, I really appreciate your opinions.
>
> Is it interesting to use both techniques? For example:
> Given a query, an optimizer:
> 1. Generates one of the possible execution plans.
> 2. Does transformations on the original plan, based on rules and
> heuristics, resulting in new alternative plans.
> 3. Evaluates the cost of generated plans by using statistics.
> 4. Keeps plans that have lower cost than the original plan
> 5. Repeat 2-4 over the new alternative plans.
> What do you think about it? Are there any restrictions that I haven't
> seen?
>
> About other method...
> Have you heard about using PDDL ("Planning Domain Definition
> Language") for query optimization?
>
> []'s
> Pryscila
>
>
> On 9/14/05, Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
> > Pryscila,
> >
> > While I haven't been too involved in the open source PostgreSQL
> optimizer,
> > I have done some work on it and optimizers in other database systems.
> >
> > Based on my work, it is my opinion that PostgreSQL, as-well-as other
> > databases which use a cost-based optimizer, prefer a breadth-first
> algorithm
> > because one cannot determine the "real" cost of each node at run-time
> > without systematically examining all possibilities through calculation.
> > This is the opposite of a rule-based optimizer which defines heuristics
> > which can be evaulated by a best-first algorithm such as A*.
> >
> > In a cost-based optimizer, the system must calculate the "cost" of each
> > path based on data that changes during run-time including indexing,
> > cardinality, tuple size, available memory, CPU usage, disk access times,
> > etc. To a cost-based optimizer, every query is unique and therefore
> cannot
> > follow a weighted path in the same fashion. I can certainly see A* being
> > used in a rule-based optimizer but not in a real-time cost-based
> optimizer.
> >
> > Perhaps Tom, Bruce, et al have more specifics on PostgreSQL's
> > implementation.
> >
> > -Jonah
> >
> >
> >
> >
> > On 9/13/05, Pryscila B Guttoski <pryscila(dot)lista(at)gmail(dot)com> wrote:
> > > Hello all!
> > >
> > > On my master course, I'm studying the PostgreSQL's optimizer.
> > > I don't know if anyone in this list have been participated from the
> > PostgreSQL's Optimizer development, but maybe someone can help me on
> this
> > question.
> > > PostgreSQL generates all possible plans of executing the query (using
> an
> > almost exhaustive search), then gives a cost to each plan and finally
> the
> > cheapest one is selected for execution.
> > > There are other methods for query optimization, one of them is based
> on
> > plan transformations (for example, using A-Star algorithm) instead of
> plan
> > constructions used by PostgreSQL.
> > > Does anyone know why this method was choosen? Are there any papers or
> > researches about it?
> > >
> > > Thank's a lot,
> > > Pryscila.
> > >
> >
> >
> >
> > --
> > Respectfully,
> >
> > Jonah H. Harris, Database Internals Architect
> > EnterpriseDB Corporation
> > http://www.enterprisedb.com/
> >
>

--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-09-14 16:58:21 Re: Spinlocks, yet again: analysis and proposed patches
Previous Message Stephen Frost 2005-09-14 16:19:35 Re: Spinlocks, yet again: analysis and proposed patches

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2005-09-14 17:10:32 Re: Low performance on Windows problem
Previous Message Pryscila B Guttoski 2005-09-14 15:52:50 Re: About method of PostgreSQL's Optimizer