Re: PostgreSQL performance enhancement when query

From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: brailateo(at)gmail(dot)com
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL performance enhancement when query
Date: 2006-08-08 11:24:19
Message-ID: 1155036259.21451.212.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2006-08-08 at 12:36, Constantin Teodorescu wrote:
> We have tried PGStatement#setPrepareThreshold with 1 as the threshold
> but it's not a good solution.
> Actually is worst. Considering that you have 5 different query plans,
> you are selecting approx. random one of them, not taking into account
> the statistics.

Wrong, you'll select _the same_ plan, that's what matters. If it's not
the plan you wanted, you have to rewrite the query, and try again, but
once you got the plan you wanted, it's pretty much you'll get always the
same plan. So you only need to test as long as you get the right query
to trigger the right plan... but of course this requires that your
queries are so constructed to always be OK with that plan, regardless
the parameter values. Usually this means a suboptimal plan, but stable
execution times.

If you need to give hints to the DB based on the parameter values and
choose different plans for different parameter values, then you
basically do the job of the planner in your application, and I guess
sooner or later you'll make wrong choices too.

Some hinting mechanism would be good for cases where the developer
really know better how the data is laid out (e.g. forcing the use of a
specific access method for one table in a complex join), but that
forcing a complete plan is probably not good. Even the hinting is only a
workaround for the planner fixes which will cannot make it to the stable
version...

On the daydreaming part, how about a 2 phase planner ?

Modus operandi:

Phase 1: compile and cache plan decision tree:
- collect all reasonable plans without taking into account the
parameter values;
- check the parameter bounds where each plan is the fastest;
- compile a decision tree which based on the parameter values chooses
one plan or the other;
- cache this plan decision tree;
- there's no need to cache plans which will always loose to some other
plan no matter what parameter values you give (to limit the size of the
decision tree);

Phase 2: run the decision tree to chose the best cached plan for the
parameter values;

You could use variables coming from the statistics system in the
decision tree so it doesn't have to be recalculated too often on
statistics changes.

With a system like this, you could at system startup make the decision
tree for all your frequently used queries and have fast planning at
runtime which is optimized for the parameter values (takes the decision
tree from the cache, runs it with the current parameters). Or just store
the whole thing in a system table... or tweak the decision tree
manually...

This is actually not addressing the plan stability issue, but if manual
tweaking would be allowed, it would...

Cheers,
Csaba.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message alfranio correia junior 2006-08-08 11:54:09 Re: standard interfaces for replication providers
Previous Message Constantin Teodorescu 2006-08-08 10:36:06 Re: PostgreSQL performance enhancement when query planner