Re: Pointers needed on optimizing slow SQL statements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Janine Sisk <janine(at)furfly(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Pointers needed on optimizing slow SQL statements
Date: 2009-06-07 20:28:24
Message-ID: 603c8f070906071328h121937a2sdc9ee3f22620015@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Jun 6, 2009 at 4:50 AM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
> The Function Index solution works, but it would be much better if we
> could get the planner to remember certain selectivities.

I agree.

> I'm thinking a command like
>
>        ANALYZE foo [WHERE .... ]
>
> which would specifically analyze the selectivity of the given WHERE
> clause for use in queries.

I think that's probably not the best syntax, because we don't want to
just do it once; we want to make it a persistent property of the table
so that every future ANALYZE run picks it up. Maybe something like:

ALTER TABLE <table> ADD ANALYZE <name> (<clause>)
ALTER TABLE <table> DROP ANALYZE <name>

(I'm not in love with this so feel free to suggest improvements.)

One possible problem with this kind of thing is that it could be
inconvenient if the number of clauses that you need to analyze is
large. For example, suppose you have a table called "object" with a
column called "type_id". It's not unlikely that the histograms and
MCVs for many of the columns in that table will be totally different
depending on the value of type_id. There might be enough different
WHERE clauses that capturing their selectivity individually wouldn't
be practical, or at least not convenient.

One possible alternative would be to change the meaning of the
<clause>, so that instead of just asking the planner to gather
selectivity on that one clause, it asks the planner to gather a whole
separate set of statistics for the case where that clause holds. Then
when we plan a query, we set the theorem-prover to work on the clauses
(a la constraint exclusion) and see if any of them are implied by the
query. If so, we can use that set of statistics in lieu of the global
table statistics. There is the small matter of figuring out what to
do if we added multiple clauses and more than one is provable, but
<insert hand-waving here>.

It would also be good to do this automatically whenever a partial
index is present.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laszlo Nagy 2009-06-08 11:09:48 Re: Why is my stats collector so busy?
Previous Message Dimitri Fontaine 2009-06-07 18:28:10 Re: Pointers needed on optimizing slow SQL statements