Re: Postgresql Caching

From: Harvell F <fharvell(at)file13(dot)info>
To: Shane Ambler <pgsql(at)007Marketing(dot)com>
Cc: mark(at)mark(dot)mielke(dot)cc, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgresql Caching
Date: 2006-10-16 14:55:05
Message-ID: ACE29058-CD5E-41D2-A356-FA140A1CA4EA@file13.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 16 Oct 2006, at 4:29, Shane Ambler wrote:

> Harvell F wrote:
>
>> Getting back to the original posting, as I remember it, the
>> question was about seldom changed information. In that case, and
>> assuming a repetitive query as above, a simple query results cache
>> that is keyed on the passed SQL statement string and that simply
>> returns the previously cooked result set would be a really big
>> performance win.
>
> I believe the main point that Mark made was the extra overhead is
> in the sql parsing and query planning - this is the part that
> postgres won't get around. Even if you setup simple tables for
> caching it still goes through the parser and planner and looses the
> benefits that memcached has. Or you fork those requests before the
> planner and loose the benefits of postgres.
> The main benefit of using memcached is to bypass the parsing and
> query planning.

That was the basis of my suggestion to just use the passed query
string as the key. No parsing or processing of the query, just a
simple string match.

> You will find there is more to sql parsing than you first think, it
> needs to find the components that make up the sql statement (tables
> column names functions) and check that they exist and can be used
> in the context of the given sql and the given data matches the
> context that is given to be used in, it needs to check that the
> current user has enough privileges to perform the requested task,
> then it locates the data whether it be in the memory cache, on disk
> or an integrated version of memcached, this would also include
> checks to make sure another user hasn't locked the data to change
> it and whether there exists more than one version of the data,
> committed and uncommitted and then sends the results back to the
> client requesting it.

The user permissions checking is a potential issue but again, for
the special case of repeated queries by the same user (the webserver
process) for the same data, a simple match of the original query
string _and_ the original query user, would still be very simple.
The big savings by having the simple results cache would be the
elimination of the parsing, planning, locating, combining, and
sorting of the results set.

I don't believe normal locking plays a part in the cache (there
are basic cache integrity locking issues though) nor does the
versioning or commit states, beyond the invalidation of the cache
upon a commit to a referenced table. It may be that the invalidation
needs to happen whenever a table is locked as well. (The hooks for
the invalidation would be done during the original caching of the
results set.)

I know that the suggestion is a very simple minded suggestion and
is limited to a very small subset of the potential query types and
interactions, however, at least for web applications, it would be a
very big win. Many website want to display today's data on their
webpage and have it change as dates change (or as users change). The
data in the source table doesn't change very often (especially
compared to a popular website) and the number of times that the exact
same query could be issued between changes can measure into the
hundreds of thousands or more. Putting even this simple results
cache into the database would really simplify the programmer's life
and improve reliability (and the use of PostgreSQL).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Brian Hurt 2006-10-16 15:36:23 Re: Hints proposal
Previous Message Tom Lane 2006-10-16 14:28:06 Re: [HACKERS] BUG #2683: spi_exec_query in plperl returns