On Fri, Mar 23, 2012 at 1:51 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> Well it's not entirely unlikely. If you step back a web application
> looks like a big loop with a switch statement to go to different
> pages. It keeps executing the same loop over and over again and there
> are only a smallish number of web pages. Sure the bind variables
> change but there will only be so many bind values and 10% of those
> will get 90% of the traffic too.
That may be true, but lots of web applications have millions of users.
The fact that a few hundred thousand of those may account for most of
the traffic doesn't seem like it's going to help much unless there are
not many users in total; and in that case it's plenty fast enough
without a cache anyway.
> But the other thing that happens is that people run multiple queries
> aggregating or selecting from the same subset of data. So you often
> get things like
> select count(*) from (<complex subquery>)
> select * from (<complex subquery>) order by foo limit 10
> select * from (<complex subquery>) order by bar limit 10
> for the same <complex subquery>. That means if we could cache the rows
> coming out of parts of the plan and remember those rows when we see a
> plan with a common subtree in the plan then we could avoid a lot of
> repetitive work.
Currently, we don't even recognize this situation within a plan; for
example, if you do project pp LEFT JOIN person sr ON pp.sales_rep_id =
sr.id LEFT JOIN person pm ON pp.project_manager_id = pm.id, the query
planner will happily seq-scan the person table twice to build two
copies of the same hash table.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: David Fetter||Date: 2012-03-23 19:07:02|
|Subject: Re: CREATE FOREGIN TABLE LACUNA|
|Previous:||From: Merlin Moncure||Date: 2012-03-23 19:01:54|
|Subject: Re: Apology to the community|