Re: Query results caching?

From: Ben-Nes Yonatan <da(at)canaan(dot)co(dot)il>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>, Dann Corbit <DCorbit(at)connx(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query results caching?
Date: 2005-08-22 22:27:39
Message-ID: 430A515B.6080307@canaan.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
>
>>I think that I was misunderstood, Ill make an example:
>>Lets say that im making the following query for the first time on the
>>"motorcycles" table which got an index on the "manufacturer" field:
>>
>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
>>manufacturer='suzuki';
>>... Total runtime: 3139.587 ms
>>
>>Now im doing the same query again and i get a much faster result (cause
>>of the "caching"): Total runtime: 332.53 ms
>>
>>After both of those queries I drop the index and query the table again
>>with the exact same query as before and now I receive: Total runtime:
>>216834.871 ms
>>
>>And for my last check I run the exact same query again (without creating
>>the INDEX back again) and I get quite similar result to my third query:
>>Total runtime: 209218.01 ms
>>
>>
>>My problem is that (maybe I just dont understand something basic
>>here...) the last 2 (also the second query but I dont care about that)
>>queries were using the "cache" that was created after the first query
>>(which had an INDEX) so none of them actually showed me what will happen
>>if a client will do such a search (without an INDEX) for the first time.
>>
>>I want to delete that "caching" after I do the first 2 queries so my
>>next queries will show me "real life results".
>
>

Ok I tried to handle both of your replies cause I got them at 2 seperate
emails.

Dann Corbit wrote:
> These results are all what I would expect. When you delete the index,
> the query will be forced to do a table scan (to examine every single
> record in the table one by one). If the table is non-trivial it is
> unlikely that either the OS or the database will cache the whole thing
> in memory. However, when you query a small record set, then it is
> likely to be retained in RAM which is literally thousands of times
> faster than disk.

Didnt know that, good to know though doesnt assure me...
What if I drop the INDEX but create a diffrent INDEX which also make the
process alot faster then without an INDEX but slower/faster then the one
before, will it wont use the former "caching"?

> If a search is to be made on a frequent basis, you should create an
> index.
> The query results above show you why.

Obvious :)

> Think about this for a minute. The real life results you want are
> very fast results. For that reason, you should try to model the
> customer queries as nearly as possible. If you have a canned
> application like order entry, then the real parameterized query set
> will probably be quite small in real life. If you are creating a
> server for ad-hoc queries then it will be far more difficult to model
> in real life.
>
> What is the real purpose of the application that you are writing?
>
> Will users be using a pre-programmed front end, or will they be typing
> in queries free-form for whatever their heart desires?

Ok ill try to describe the system as short & precise as possible (its
also passed midnight here :)).
Each day I receive about 4 million rows of data (products) which I
insert into table1 (after I delete all of the previous data it had),
along it I receive for every row about another 15 keywords which I
insert into table2 (where as in table1 I delete all of the previous data
it had also), this process is a fact that I cant change.
Now the users of the site can search for data from table1 by typing
whichever (and up to 4) words as they want at a text field (search input
string) and the server should display the correct results by querying
table1 & join table2 for its keywords.
I succeded to do it quite fast but when I tried to ORDER BY my results
its times jumped up drastically (2-3 seconds for a query... and thats
after the caching..).
I can't allow a situation where a user will search with a keyword which
wasnt 'cached' before and because of that he will wait 15 seconds for a
result.

Jim C. Nasby wrote:
> Emptying the cache will not show real-life results. You are always going
> to have some stuff cached, even if you get a query for something new. In
> this case (since you'll obviously want those indexes there), after some
> amount of time you will have most (if not all) of the non-leaf index
> pages cached, since they take a fairly small amount of memory and are
> frequently accessed. This makes index traversal *much* faster than your
> initial case shows, even if you query on something different each time.
> Testing with a completely empty cache just isn't that realistic.

As far as I understand it at my situation where all of the data is
deleted and inserted each day from the start (INDEX will get lost with
it..) & the endless variety of possible keywords search's & the immense
size of the tables, the following reason wont last.. or am I wrong here?

Because of all of that I want to be able to see how much time a query
takes when its the first time its being run..... or I'm wrong again and
failing to understand something?

Again everyone THANKS ALOT its really amazing the help that I receive
from you!
Ben-Nes Yonatan

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ben-Nes Yonatan 2005-08-22 22:28:28 Re: Query results caching?
Previous Message Dann Corbit 2005-08-22 22:24:52 Re: Query results caching?