Re: Perfomance of views

From: Richard Huxton <dev(at)archonet(dot)com>
To: Svenne Krap <svenne(at)krap(dot)dk>, 'Postgresql Performance' <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Perfomance of views
Date: 2005-10-27 11:29:38
Message-ID: 4360BA22.5050606@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Don't forget to CC the list

Svenne Krap wrote:
> What do you mean exactly but "pushing conditions inside" ?

If I have something like "SELECT * FROM complicated_view WHERE foo = 7"
then the planner can look "inside" complicated_view and see where it can
attach the condition "foo=7", rather than running the query and applying
the condition at the end.

There are cases where it is safe for the planner to do this, but it
isn't smart enough to do so.

> I don't think I will have the option of testing on the full queries, as
> these take many days to write (the current ones, they are replacing on a
> mssql takes up more that 5kb of query). The current ones are nightmares
> from a maintaince standpoint.

Hmm - it sounds like they would be.

> Basicly what the application is doing is selecting some base data from
> the "large" table for a point in time (usually a quarter) and selects
> all matching auxilliare data from the other tables. They are made in a
> time-travel like manner with a first and last useable date.
>
> The ways I have considered was :
> 1) write a big query in hand (not preferred as it gets hard to manage)

Agreed.

> 2) write layers of views (still not prefered as I still have to remember
> to put on the right conditions everywhere)

This is what I'd probably do, but of course I don't have full
information about your situation.

> 3) write layers of sql-functions (returning the right sets of rows from
> the underlying tables) - which I prefer from a development angel .. it
> gets very clean and I cant forget a parameter anywhere.
>
> But I seem to remember (and I have used PGSQL in production since 7.0)
> that the planner has some problems with solution 3 (i.e. estimating the
> cost and rearranging the query), but frankly that would be the way I
> would like to go.

Well, 8.x can "inline" a simple sql function into a larger query, but it
doesn't sound like that will be enough in your case. Once a function
becomes a "black box" then there's not much the planner can do to figure
out what to do.

> Based on the current (non-optimal) design and hardware constraints, I
> still have to make sure, the query runs fairly optimal - that means the
> planner must use indexes intelligently and other stuff as if it was
> (well-)written using solution 1.

Well, #1,#2 are likely to be the most efficient, but you won't know for
sure about #2 until you test it.

There are a couple of other options though:

#4 - Write a set-returning function that breaks the query into steps and
executes each in turn. So - fetch IDs from the main table in step 1 and
store them in a temporary table, join other tables in later steps.

#5 - Write a function that writes your big query for you and either
returns the SQL to your application, or runs it and returns the results.

> What do you think of the three solutions ? And is there some ressource
> about the planners capabilites for someone like me (that is very used to
> write reasonably fast and complex sql, can read c-code, but does not
> really want to dig into the source code)

There is some stuff in the "Internals" section of the manuals and it
might be worth rummaging around on http://techdocs.postgresql.org

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2005-10-27 12:59:51 Re: browsing table with 2 million records
Previous Message Svenne Krap 2005-10-27 11:01:09 Re: Perfomance of views