Skip site navigation (1) Skip section navigation (2)

Re: WIP: cross column correlation ...

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: cross column correlation ...
Date: 2011-02-25 23:41:09
Message-ID: 4D683E15.4090409@agliodbs.com (view raw or flat)
Thread:
Lists: pgsql-hackers
> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then?  For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached.  How much of the data that the user asked for is cache?

FWIW, for a manual override setting, I was thinking that the % would
convert to a probability.  In that way, it wouldn't be different from
the existing RPC calculation; we're just estimating how *likely* it is
that the data the user wants is cached.

> One idea Tom and I kicked around previously is to set an assumed
> caching percentage for each table based on its size relative to
> effective_cache_size - in other words, assume that the smaller a table
> is, the more of it will be cached.  Consider a system with 8GB of RAM,
> and a table which is 64kB.  It is probably unwise to make any plan
> based on the assumption that that table is less than fully cached.  If
> it isn't before the query executes, it soon will be.  Going to any
> amount of work elsewhere in the plan to avoid the work of reading that
> table in from disk is probably a dumb idea.  Of course, one downside
> of this approach is that it doesn't know which tables are hot and
> which tables are cold, but it would probably still be an improvement
> over the status quo.

Actually, we *do* have some idea which tables are hot.  Or at least, we
could.   Currently, pg_stats for tables are "timeless"; they just
accumulate from the last reset, which has always been a problem in
general for monitoring.  If we could make top-level table and index
stats time-based, even in some crude way, we would know which tables
were currently hot.  That would also have the benefit of making server
performance analysis and autotuning easier.

> But DBAs
> frequently have a very good idea of which stuff is in cache - they can
> make observations over a period of time and then adjust settings and
> then observe some more and adjust some more.

Agreed.

-- 
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

In response to

Responses

pgsql-hackers by date

Next:From: Josh BerkusDate: 2011-02-25 23:44:39
Subject: Re: disposition of remaining patches
Previous:From: Cédric VillemainDate: 2011-02-25 22:35:45
Subject: Re: WIP: cross column correlation ...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group