Re: query cache

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Billy Earney <billy(dot)earney(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: query cache
Date: 2012-03-23 18:08:54
Message-ID: CAHyXU0ySQV+UA3jcHw_ebxumiEow7ibY890CmwZjoaObAVhY8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 23, 2012 at 12:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The complication, opportunities for bugs, and general slowdown
>>> associated with that would outweigh any possible gain, in the opinion
>>> of most hackers who have thought about this.
>>
>> I wouldn't be quite so pessimistic. I think the problem is that the
>> hard part in doing this for real is all the parts the proposal glosses
>> over. How much memory is it worth dedicating to the cache before the
>> cost of that memory costs more than it helps? How do you invalidate
>> cache entries efficiently enough that it doesn't become a bottleneck?
>
> I think the question of how you would invalidate things is a very good one.
>
> The other thing that makes me skeptical of this proposal is that I am
> not very sure that executing absolutely identical queries is a very
> common use case for a relational database.  I suppose there might be a
> few queries that run over and over again (e.g. whatever you need to
> render your home page), but I think those will be the exception, and
> not the rule.  It therefore seems likely that the overhead of such a
> cache would in most cases be greater than the benefit of having it in
> the first place.
>
> What I think is more common is the repeated submission of queries that
> are *nearly* identical, but with either different parameter bindings
> or different constants.  It would be nice to have some kind of cache
> that would allow us to avoid the overhead of parsing and planning
> nearly identical statements over and over again, but the trick is that
> you have to fingerprint the query to notice that's happening in the
> first place, and the fingerprinting has to cost less than what the
> cache saves you.  I don't know whether that's possible, but I suspect
> it's far from easy.

Query cache basically addresses two use cases:
1) read only or mostly read only workloads
2) badly written application code (either by human or machine)

The problem is that #1 can be optimized by any number of simple
techniques, and #2 is not a good basis for complicated internal
features with nasty trade-offs. mysql's query cache woes are well
known -- it's typical for administrators to turn the feature off. The
feature is misnamed -- it's a 'benchmark cheating feature' since a lot
of db benchmarks tend to focus on single user loads and/or highly
repetitive queries but completely falls over in production real world
workloads. Also, it's really not that difficult to rig an ad-hoc
cache in the server or on the client side and you can then gear it
towards your particular use-case.

People that are asking for this probably really want materialized views instead.

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-03-23 18:28:30 Re: Finer Extension dependencies
Previous Message Greg Stark 2012-03-23 17:51:12 Re: query cache