Re: query cache

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Billy Earney <billy(dot)earney(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: query cache
Date: 2012-03-29 20:18:36
Message-ID: CA+TgmobpMsUCKNhA_vbURAWvyXd+b5YDQovW1QOu4_f5kmsCnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 23, 2012 at 2:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> What I think is more common is the repeated submission of queries that
>> are *nearly* identical, but with either different parameter bindings
>> or different constants.  It would be nice to have some kind of cache
>> that would allow us to avoid the overhead of parsing and planning
>> nearly identical statements over and over again, but the trick is that
>> you have to fingerprint the query to notice that's happening in the
>> first place, and the fingerprinting has to cost less than what the
>> cache saves you.  I don't know whether that's possible, but I suspect
>> it's far from easy.
>
> The traditional solution to this is to make the application do it, ie,
> parameterized prepared statements.  Since that has direct benefits to
> the application as well, in that it can use out-of-line values and
> thereby avoid quoting and SQL-injection risks, it's not apparent that
> it's a good idea to expend lots of sweat to reverse-engineer
> parameterization from a collection of unparameterized queries.

But there are several disadvantages to unparameterized queries, too.
One, it requires more messages at the protocol level, which is not
free. Two, whatever abstraction layer you're using to submit queries
to the database has to support it, and not all of them do. And three,
you get worse query plans if lack of knowledge about the specific
query parameters proves to be essential. I know you've done some work
on this last point for 9.2, but I'm fuzzy on the details and on how
much benefit we'll get out of it in real-world cases.

Interestingly, Peter Geoghegan's blog post on the pg_stat_statements
patch you just committed[1] claims that the overhead of fingerprinting
queries was only 1-2.5%, which is less than I would have thought, so
if we ever get to the point where we're fairly sure we've got problem
three licked, it might make sense to revisit this due to problems one
and two. It's also probably worth keeping in mind the next time we
bump the protocol version: it would be nice to have a way of doing
prepare-bind-execute in a single protocol message, which I believe to
be not possible at present.

[1] http://pgeoghegan.blogspot.com/2012/03/much-improved-statement-statistics.html

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-03-29 20:21:36 Re: Command Triggers patch v18
Previous Message Robert Haas 2012-03-29 20:08:04 Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)