Quick Links

Re: cache estimates, cache access cost

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: cache estimates, cache access cost
Date:	2011-05-17 19:14:28
Message-ID:	BANLkTimZpDpPizxei4A-szs1HYb0Y8x_HA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, May 15, 2011 at 11:52 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Cédric Villemain wrote:
>>
>> http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache
>
> This rebases easily to make Cedric's changes move to the end; I just pushed
> a version with that change to
> https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone
> wants a cleaner one to browse. I've attached a patch too if that's more
> your thing.

Thank you. I don't much like sucking in other people's git repos - it
tends to take a lot longer than just opening a patch file, and if I
add the repo as a remote then my git repo ends up bloated. :-(

> The more important question is how to store the data collected and
> then use it for optimizing queries.

Agreed, but unless I'm missing something, this patch does nothing
about that. I think the first step needs to be to update all the
formulas that are based on random_page_cost and seq_page_cost to
properly take cache_page_cost into account - and in some cases it may
be a bit debatable what the right mathematics are.

For what it's worth, I don't believe for a minute that an analyze
process that may run only run on a given table every six months has a
chance of producing useful statistics about the likelihood that a
table will be cached. The buffer cache can turn over completely in
under a minute, and a minute is a lot less than a month. Now, if we
measured this information periodically for a long period of time and
averaged it, that might be a believable basis for setting an optimizer
parameter. But I think we should take the approach recently discussed
on performance: allow it to be manually set by the administrator on a
per-relation basis, with some reasonable default (maybe based on the
size of the relation relative to effective_cache_size) if the
administrator doesn't intervene. I don't want to be excessively
negative about the approach of examining the actual behavior of the
system and using that to guide system behavior - indeed, I think there
are quite a few places where we would do well to incorporate that
approach to a greater degree than we do currently. But I think that
it's going to take a lot of research, and a lot of work, and a lot of
performance testing, to convince ourselves that we've come up with an
appropriate feedback mechanism that will actually deliver better
performance across a large variety of workloads. It would be much
better, IMHO, to *first* get a cached_page_cost parameter added, even
if the mechanism by which caching percentages are set is initially
quite crude - that will give us a clear-cut benefit that people can
begin enjoying immediately.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: cache estimates, cache access cost at 2011-05-16 03:52:56 from Greg Smith

Responses

Re: cache estimates, cache access cost at 2011-05-17 22:11:52 from Cédric Villemain

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Darren Duncan	2011-05-17 20:31:42	deprecating contrib for PGXN
Previous Message	Jaime Casanova	2011-05-17 19:11:49	Re: DOMAINs and CASTs