Re: eWeek Poll: Which database is most critical to your

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: eWeek Poll: Which database is most critical to your
Date: 2002-02-27 05:32:21
Message-ID: 1014787941.2127.64.camel@rh72.home.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2002-02-27 at 10:39, Tom Lane wrote:
> Neil Conway <nconway(at)klamath(dot)dyndns(dot)org> writes:
> > When processing INSERTs, UPDATEs and DELETEs, check if the query
> > would affect any of the tables for which we are maintaing this cache. If
> > so, flush the cache. This ensures that we will never return invalid
> > results.
>
> Note that this would imply that the cache is *global* across all
> backends; therefore it is a shared data structure and hence an access
> bottleneck. (Not to mention a memory-management headache, since the
> size of shared memory can't easily be varied on-the-fly.)

I think that it would be enough if the hashes and bookkeeping info
(tmin,tmax,filename) were in a global table. We could also purge all
data more than a few minutes old. We also need an inverse lookup from
changed table to cached query for cache invalidation on
insert/update/delete

The result could be even saved in temp files and be mostly faster than
doing the full parse/plan/execute, both for complex queries returning a
few rows (saves planning time) or many rows (saves execute time).

The format used for saving should be exact wire protocol, so that
efficient system calls could be used where available (linux 2.4+ has a
system call that will transfer a whole file to a socket in one call
bypassing all copying and cacheing)

The lookup part will be pretty trivial - lookup using hash, check for
tmin/tmax, if ok push cached result out to client.This will make us as
fast or faster than MySQL for trivial_benchmark/busy_website case.

The cache creation/maintenance part will be much trickier -

When creating cache
* the tables affected can be determined only from fully built plans
because of possible rule expansions.
* if there is a trigger on select for this query it can't be cached
* put some temporary insert/update/delete triggers on all real tables
used in query that will invalidate cache - as an alternatively we
could always run the invalidate-query-cache code for affected table
on insert/update/delete on a table if exact caching is enabled
* invalidate cache on schema changes
* run a periodic check and invalidate old cache entries.

Some of the above could also be needed for caching query plans.

> I cannot believe that caching results for literally-identical queries
> is a win, except perhaps for the most specialized (read brain dead)
> applications.

Actually a web app that looks up contents of a 5 row combobox is not
really brain-dead.

Doing all the caching (and cache invalidation) on client side is hard
and fragle - what happens when someone adds a trigger in backend ?

> Has anyone looked at the details of the test case that
> MySQL uses to claim that this is a good idea? Has it got any similarity
> to your own usage patterns?

Yes - for content management / web apps.

No - for bean-counting apps.

> We have talked about caching query plans for suitably-parameterized
> queries, but even that seems unlikely to be a huge win; at least I'd
> not think it useful to try to drive the cache totally automatically.
> If an application could say "here's a query I expect to use a lot,
> varying these specific parameters" then caching a plan for that would
> make sense.
>
> Now, there are notions of "prepared statements" in many access APIs
> that fit this description, and in fact the underlying capability exists
> in the backend --- we've just not gotten around to building the
> interfaces to tie it all together. *That* would be worth working on.

Sure. It would

a) make many queries faster

b) make client libs (ODBC/JDBC/ECPG) faster and simpler by not forcing
them to fake it.

But there is also a big class of applications that would benefit much
more from caching exact queries.

And it will make us as fast as MySQL for 100000 consecutive calls of
SELECT MAX(N) FROM T ;)

---------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2002-02-27 05:38:36 Re: eWeek Poll: Which database is most critical to your
Previous Message Bruce Momjian 2002-02-27 05:31:34 Re: Refactoring of command.c