Re: [WIP] cache estimates, cache access cost

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: stark(at)mit(dot)edu, cedric(dot)villemain(dot)debian(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] cache estimates, cache access cost
Date: 2011-06-20 00:30:24
Message-ID: 4DFE94A0.3030009@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/19/2011 06:15 PM, Kevin Grittner wrote:
> I think the point is that if, on a fresh system, the first access to
> a table is something which uses a tables scan -- like select count(*)
> -- that all indexed access would then tend to be suppressed for that
> table. After all, for each individual query, selfishly looking at
> its own needs in isolation, it likely *would* be faster to use the
> cached heap data.
>

If those accesses can compete with other activity, such that the data
really does stay in the cache rather than being evicted, then what's
wrong with that? We regularly have people stop by asking for how to pin
particular relations to the cache, to support exactly this sort of scenario.

What I was would expect on any mixed workload is that the table would
slowly get holes shot in it, as individual sections were evicted for
more popular index data. And eventually there'd be little enough left
for it to win over an index scan. But if people keep using the copy of
the table in memory instead, enough so that it never really falls out of
cache, well that's not necessarily even a problem--it could be
considered a solution for some.

The possibility that people can fit their entire table into RAM and it
never leaves there is turning downright probable in some use cases now.
A good example are cloud instances using EC2, where people often
architect their systems such that the data set put onto any one node
fits into RAM. As soon as that's not true you suffer too much from disk
issues, so breaking the databases into RAM sized pieces turns out to be
very good practice. It's possible to tune fairly well for this case
right now--just make the page costs all low. The harder case that I see
a lot is where all the hot data fits into cache, but there's a table or
two of history/archives that don't. And that would be easier to do the
right thing with given this bit of "what's in the cache?" percentages.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-06-20 00:46:56 Re: the big picture for index-only scans
Previous Message Florian Pflug 2011-06-19 23:59:44 Re: the big picture for index-only scans