Re: Query meltdown: caching results

From: Gordon <gordon(dot)mcvey(at)ntlworld(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Query meltdown: caching results
Date: 2008-02-26 15:29:22
Message-ID: 7c58e2dc-2aa1-4dc5-831c-632dc44abe71@d21g2000prf.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Feb 26, 11:11 am, Gordon <gordon(dot)mc(dot)(dot)(dot)(at)ntlworld(dot)com> wrote:
> I'm working on a CMS that, in addition to the database-stored version
> of articles for easy searching, sorting, etc, also stores a HTML file
> version so pages can be fetched with the minimum of overhead (browsing
> articles has no more overhead than accessing any other HTML file on
> the server).
>
> As I've been trying to keep the system modular I've taken to using an
> OO approach to the server side scripts, which are all written in PHP 5
> and use PDO for database access. I've also been using prepared
> sequences almost exclusively for security and performance reasons.
> I've tried to wrap sequences of queries in transactions as well, to
> eliminate the "every query is its own transaction" overhead.
>
> With previous projects which I wrote using structured programming
> methods it was quite easy to hold caches of results and keep database
> queries to a minimum, but I've found this extremely difficult to pull
> off when using the OO approach, and now it's starting to have some
> real performance consequences. The biggest one comes when publishing
> a document that has siblings. CMS content is organized in a tree with
> folders, subfolders and documents. A document can be published, where
> both a HTML and database copy exist, or unpublished, where only the
> database version exists, thus denying visitors to the site access to
> it. Documents in a folder get a sidebar with links to the other
> documents in the same folder, and when you change the published status
> of a document then all the other documents that are also published in
> that folder have to be republished in order to update their
> sidebars.
>
> This means fetching a list of all the documents with the same parent
> and that have a published flag status of true, using the text stored
> in the database to generate the HTML page and saving it to disk.
> Documents have an associated template, which also has to be fetched
> from the database. And all documents have data such as their path,
> which is a chain of the document's parents back to the root so that
> things like breadcrumbs can be generated.
>
> In the structured approach I'd have just cached stuff like the trail
> back to the root as I know it'll be the same for all documents, so I'd
> only have to run the sequences of queries to get the full trail once.
> But as each instance of a document is independent of all the others
> doing things like this is proving really difficult.
>
> I need to find a way of not running queries that I don't need to,
> either in the PHP script, or in the Postgres database. What I need is
> for a result set to be cached somewhere, either by Postgres or PHP, so
> when it sees the same query again in a given session it just returns
> the previously fetched result set. The cache also needs to be able to
> disregard its cached result sets when an event that changes a table
> occurs (insert, update, delete, etc).
>
> On the PHP side I've written a simple Database class that extends PDO
> and that I use in its place. It's a simple class that basically I use
> to allow me to nest calls to beginTransaction(), commit () and
> rollback () (It only starts an actual transaction of a counter is 0.
> Otherwide it just increments the counter. Commit only actually
> commits when the counter is 1, and decrements it otherwise. Rollback
> sets an error flag and decrements the counter, and only rolls back
> when the counter is 1. If the error flag is set then commit will
> actually roll back instead. )
>
> My options are, as far as I can tell,
>
> 1) replace the Database PDO extending class with something else that
> provides query results caching in PHP, or
> 2) get Postgres itself to cache the results of queries to avoid
> running them repeatedly during a session.
>
> I seem to remember MySQL providing some kind of results caching, can
> Postgres do the same? Has anyone else run into similar problems and
> how did they overcome them?

I have an idea for how to do it but I'm not quite sure how to
accomplish it fully. Aspects involving modifications to the tables
are going to be particularly problematic.

My idea is to extend the PDOStatement class with an internal result
cache. I'm already caching PDOStatements in order to prevent the
script from trying to prepare the same queries over and over again.
The cache will be an array. The execute(), fetch(). fetchall() etc
methods will be aware of the array and return values from it if
possible.

Things risk getting really tricky really quickly, however. If a
modification is made to a table, then any or all of the cached data in
all the PDOStatements may no longer be valid and will need to be
flushed. This is leading me to suspect that this is a far from ideal
way of doing things.

I know that Postgres can cache query plans, but what about results?
Can/do they get cached too?

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Norman Peelman 2008-02-26 15:32:25 Re: Query meltdown: caching results
Previous Message Hermann Muster 2008-02-26 15:07:23 dbi_link and dbi:ODBC