Re: Query results caching?

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Ben-Nes Yonatan" <da(at)canaan(dot)co(dot)il>, "Sean Davis" <sdavis2(at)mail(dot)nih(dot)gov>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query results caching?
Date: 2005-08-22 19:26:58
Message-ID: D425483C2C5C9F49B5B7A41F8944154757CFF6@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> From: Ben-Nes Yonatan [mailto:da(at)canaan(dot)co(dot)il]
> Sent: Monday, August 22, 2005 1:14 PM
> To: Sean Davis; Dann Corbit
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Query results caching?
>
> Sean Davis wrote:
> > On 8/22/05 1:59 PM, "Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
> >
> >
> >>
> >>>-----Original Message-----
> >>>From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> >>>owner(at)postgresql(dot)org] On Behalf Of Ben-Nes Yonatan
> >>>Sent: Monday, August 22, 2005 9:03 AM
> >>>To: pgsql-general(at)postgresql(dot)org
> >>>Subject: [GENERAL] Query results caching?
> >>>
> >>>Hi all,
> >>>
> >>>I dont know how its called but I noticed that when I query the db
for
> >>>the first time it give me the result slower then the next times ill
> >>>repeat the same exact query, I figure that its some kind of caching
so
> >>>henceforth the title of the mail :)
> >>
> >>The operating system and the database will both percolate frequently
> >>used information from disk into memory. Particularly if they are
SELECT
> >>queries, they will get faster and faster.
> >>
> >>
> >>>Anyway I would want to be able to delete that "caching" after every
> >>>query test that I run, cause I want to see the real time results
for
> >>
> >>my
> >>
> >>>queries (its for a searching option for users so it will vary
alot).
> >>
> >>Those are the real times for your queries.
> >>
> >>
> >>>Is it possible to do it manually each time or maybe only from the
> >>>configuration?
> >>
> >>You will have to query a different table each time.
> >
> >
> > Just to extend this notion a bit, if you want to test your
application
> > speed, you may want to generate "real-world" input to determine the
> actual
> > behavior/speed under real conditions. As Dann pointed out, the
results
> for
> > timings are "real" in that if the user generated the queries as you
did,
> the
> > timing results would be (nearly) the same as for you. It seems that
> your
> > concern is that the user will not generate the same type of input
that
> you
> > did (that it will vary more), so the best solution may be to
actually
> > generate some test queries that actually conform to what you think
the
> user
> > input will look like.
> >
> > Sean
> >
> I think that I was misunderstood, Ill make an example:
> Lets say that im making the following query for the first time on the
> "motorcycles" table which got an index on the "manufacturer" field:
>
> EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> manufacturer='suzuki';
> ... Total runtime: 3139.587 ms
>
> Now im doing the same query again and i get a much faster result
(cause
> of the "caching"): Total runtime: 332.53 ms
>
> After both of those queries I drop the index and query the table again
> with the exact same query as before and now I receive: Total runtime:
> 216834.871 ms
>
> And for my last check I run the exact same query again (without
creating
> the INDEX back again) and I get quite similar result to my third
query:
> Total runtime: 209218.01 ms

These results are all what I would expect. When you delete the index,
the query will be forced to do a table scan (to examine every single
record in the table one by one). If the table is non-trivial it is
unlikely that either the OS or the database will cache the whole thing
in memory. However, when you query a small record set, then it is
likely to be retained in RAM which is literally thousands of times
faster than disk.

> My problem is that (maybe I just dont understand something basic
> here...) the last 2 (also the second query but I dont care about that)
> queries were using the "cache" that was created after the first query
> (which had an INDEX) so none of them actually showed me what will
happen
> if a client will do such a search (without an INDEX) for the first
time.

If a search is to be made on a frequent basis, you should create an
index.
The query results above show you why.

> I want to delete that "caching" after I do the first 2 queries so my
> next queries will show me "real life results".

Think about this for a minute. The real life results you want are very
fast results. For that reason, you should try to model the customer
queries as nearly as possible. If you have a canned application like
order entry, then the real parameterized query set will probably be
quite small in real life. If you are creating a server for ad-hoc
queries then it will be far more difficult to model in real life.

What is the real purpose of the application that you are writing?

Will users be using a pre-programmed front end, or will they be typing
in queries free-form for whatever their heart desires?


> Thanks alot again,
> Yonatan

Browse pgsql-general by date

  From Date Subject
Next Message Jim C. Nasby 2005-08-22 19:51:49 Re: Query results caching?
Previous Message Jim C. Nasby 2005-08-22 18:38:04 Re: Testing of MVCC