Re: Again, sorry, caching.

From: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: mlw <markw(at)mohawksoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Again, sorry, caching.
Date: 2002-03-19 18:12:52
Message-ID: 20020319181252.GI31839@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2002 at 09:35:51PM -0500, Neil Conway wrote:
> >
> > "It is an application issue"
> > This is completely wrong. Caching can not be done against a database without
> > knowledge of the database, i.e. when the data changes.
>
> But can't this be achieved by using a LISTEN/NOTIFY model, with
> user-created rules to NOTIFY the appropriate listener when a table
> changes? With a good notification scheme like this, you don't need to
> continually poll the DB for changes. You don't need to teach your cache
> a lot of things about the database, since most of that knowledge is
> encapsulated inside the rules, and supporting tables.
>
> My impression (I could be wrong) is that LISTEN/NOTIFY doesn't get the
> press that it deserves. If this model isn't widely used because of some
> deficiencies in the LISTEN/NOTIFY implementation, IMHO our time would be
> better spent fixing those problems than implementing the proposed
> caching scheme.
>
> If we're looking to provide a "quick and easy" caching scheme for users
> attracted to MySQL's query cache, why not provide this functionality
> through another application? I'm thinking about a generic "caching
> layer" that would sit in between Postgres and the database client. It
> could speak the FE/BE protocol as necessary; it would use LISTEN/NOTIFY
> to allow it to efficiently be aware of database changes; it would create
> the necessary rules for the user, providing a simple interface to
> enabling query caching for a table or a set of tables?
>
> What does everyone think?

Neil, this sounds like exactly the approach to follow up on: the one part
of caching that _is_ the backends domain is knowing about invalidation
events. And LISTEN/NOTIFY has _exactly_ the right behavior for that -
you don't get out of transaction NOTIFYs, for example. As it stands,
the application developer has to have intimate knowledge of the schema
to set up the correct NOTIFY triggers for any given query. This works
against developing a generic middleware solution, since one would have
to parse the SQL to guess at the affected tables.

How about an extension that autocreates INSERT/UPDATE/DELETE triggers
that send NOTIFYs, based on all tables accessed by a given SELECT? As
an example, I could see extending the Zope PsycoPG database adaptor,
(which already tries some simple timeout based caching) to tack on
something like:

SELECT foo,bar FROM baz CACHENOTIFY <notifyname>

whenever it creates a cache fora given query, then setting up the correct
LISTEN to invalidate that cache. Alternatively, the LISTEN could be
automatic. The name might be autogenerated, as well, to avoid collision
probelms. Or perhaps _allow_ collisions to extend the notification
set? (I could see setting _all_ the queries that generate one web page
to NOTIFY together, since the entire page needs to be regenerated on cache
invalidation)

Then, the existing interface to SQL queries would allow the app developer
to set logical caching policies for each query, independently. The backend
does only the part that it alone can do: determine all the tables touched
by a given query. The middleware and app developer are then free to cache
at the appropriate level (SQL result set, fully formatted web page, etc.)
This clearly is only useful in a connection pooling environment,
so the long lived backends are around to receive the NOTIFYs. Hmm, no,
I think it would be possible with this to have a seperate process do
the LISTEN and cache invalidation, while a pool of other backends are
used for general access, no?

Seems like a win all around. Anyone else have comments? How insane
would the auto trigger creation get? It seems to me that this would be
similar in spirit to the referential integrity work, but more dynamic,
since simple SELECTs would be creating backend triggers. Potential for
DOS attacks, for ex. but not much worse I suppose than firing off big
nasty cartesian cross product queries.

Ross

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2002-03-19 18:13:56 Re: Again, sorry, caching, (Tom What do you think: function
Previous Message Peter Eisentraut 2002-03-19 17:21:26 Re: [HACKERS] Fixes gram.y