From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: cache estimates, cache access cost |
Date: | 2011-05-17 19:14:28 |
Message-ID: | BANLkTimZpDpPizxei4A-szs1HYb0Y8x_HA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, May 15, 2011 at 11:52 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Cédric Villemain wrote:
>>
>> http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache
>
> This rebases easily to make Cedric's changes move to the end; I just pushed
> a version with that change to
> https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone
> wants a cleaner one to browse. I've attached a patch too if that's more
> your thing.
Thank you. I don't much like sucking in other people's git repos - it
tends to take a lot longer than just opening a patch file, and if I
add the repo as a remote then my git repo ends up bloated. :-(
> The more important question is how to store the data collected and
> then use it for optimizing queries.
Agreed, but unless I'm missing something, this patch does nothing
about that. I think the first step needs to be to update all the
formulas that are based on random_page_cost and seq_page_cost to
properly take cache_page_cost into account - and in some cases it may
be a bit debatable what the right mathematics are.
For what it's worth, I don't believe for a minute that an analyze
process that may run only run on a given table every six months has a
chance of producing useful statistics about the likelihood that a
table will be cached. The buffer cache can turn over completely in
under a minute, and a minute is a lot less than a month. Now, if we
measured this information periodically for a long period of time and
averaged it, that might be a believable basis for setting an optimizer
parameter. But I think we should take the approach recently discussed
on performance: allow it to be manually set by the administrator on a
per-relation basis, with some reasonable default (maybe based on the
size of the relation relative to effective_cache_size) if the
administrator doesn't intervene. I don't want to be excessively
negative about the approach of examining the actual behavior of the
system and using that to guide system behavior - indeed, I think there
are quite a few places where we would do well to incorporate that
approach to a greater degree than we do currently. But I think that
it's going to take a lot of research, and a lot of work, and a lot of
performance testing, to convince ourselves that we've come up with an
appropriate feedback mechanism that will actually deliver better
performance across a large variety of workloads. It would be much
better, IMHO, to *first* get a cached_page_cost parameter added, even
if the mechanism by which caching percentages are set is initially
quite crude - that will give us a clear-cut benefit that people can
begin enjoying immediately.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Darren Duncan | 2011-05-17 20:31:42 | deprecating contrib for PGXN |
Previous Message | Jaime Casanova | 2011-05-17 19:11:49 | Re: DOMAINs and CASTs |