Re: [PATCHES] ARC Memory Usage analysis

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Kenneth Marshall <ktm(at)is(dot)rice(dot)edu>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, josh(at)agliodbs(dot)com
Subject: Re: [PATCHES] ARC Memory Usage analysis
Date: 2004-10-26 08:49:15
Message-ID: 1098780555.6807.136.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches pgsql-performance

On Mon, 2004-10-25 at 16:34, Jan Wieck wrote:
> The problem is, with a too small directory ARC cannot guesstimate what
> might be in the kernel buffers. Nor can it guesstimate what recently was
> in the kernel buffers and got pushed out from there. That results in a
> way too small B1 list, and therefore we don't get B1 hits when in fact
> the data was found in memory. B1 hits is what increases the T1target,
> and since we are missing them with a too small directory size, our
> implementation of ARC is propably using a T2 size larger than the
> working set. That is not optimal.

I think I have seen that the T1 list shrinks "too much", but need more
tests...with some good test results

The effectiveness of ARC relies upon the balance between the often
conflicting requirements of "recency" and "frequency". It seems
possible, even likely, that pgsql's version of ARC may need some subtle
changes to rebalance it - if we are unlikely enough to find cases where
it genuinely is out of balance. Many performance tests are required,
together with a few ideas on extra parameters to include....hence my
support of Jan's ideas.

That's also why I called the B1+B2 hit ratio "turbulence" because it
relates to how much oscillation is happening between T1 and T2. In
physical systems, we expect the oscillations to be damped, but there is
no guarantee that we have a nearly critically damped oscillator. (Note
that the absence of turbulence doesn't imply that T1+T2 is optimally
sized, just that is balanced).

[...and all though the discussion has wandered away from my original
patch...would anybody like to commit, or decline the patch?]

> If we would replace the dynamic T1 buffers with a max_backends*2 area of
> shared buffers, use a C value representing the effective cache size and
> limit the T1target on the lower bound to effective cache size - shared
> buffers, then we basically moved the T1 cache into the OS buffers.

Limiting the minimum size of T1len to be 2* maxbackends sounds like an
easy way to prevent overbalancing of T2, but I would like to follow up
on ways to have T1 naturally stay larger. I'll do a patch with this idea
in, for testing. I'll call this "T1 minimum size" so we can discuss it.

Any other patches are welcome...

It could be that B1 is too small and so we could use a larger value of C
to keep track of more blocks. I think what is being suggested is two
GUCs: shared_buffers (as is), plus another one, larger, which would
allow us to track what is in shared_buffers and what is in OS cache.

I have comments on "effective cache size" below....

On Mon, 2004-10-25 at 17:03, Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > This all only holds water, if the OS is allowed to swap out shared
> > memory. And that was my initial question, how likely is it to find this
> > to be true these days?
>
> I think it's more likely that not that the OS will consider shared
> memory to be potentially swappable. On some platforms there is a shmctl
> call you can make to lock your shmem in memory, but (a) we don't use it
> and (b) it may well require privileges we haven't got anyway.

Are you saying we shouldn't, or we don't yet? I simply assumed that we
did use that function - surely it must be at least an option? RHEL
supports this at least....

It may well be that we don't have those privileges, in which case we
turn off the option. Often, we (or I?) will want to install a dedicated
server, so we should have all the permissions we need, in which case...

> This has always been one of the arguments against making shared_buffers
> really large, of course --- if the buffers aren't all heavily used, and
> the OS decides to swap them to disk, you are worse off than you would
> have been with a smaller shared_buffers setting.

Not really, just an argument against making them *too* large. Large
*and* utilised is OK, so we need ways of judging optimal sizing.

> However, I'm still really nervous about the idea of using
> effective_cache_size to control the ARC algorithm. That number is
> usually entirely bogus. Right now it is only a second-order influence
> on certain planner estimates, and I am afraid to rely on it any more
> heavily than that.

...ah yes, effective_cache_size.

The manual describes effective_cache_size as if it had something to do
with the OS, and some of this discussion has picked up on that.

effective_cache_size is used in only two places in the code (both in the
planner), as an estimate for calculating the cost of a) nonsequential
access and b) index access, mainly as a way of avoiding overestimates of
access costs for small tables.

There is absolutely no implication in the code that effective_cache_size
measures anything in the OS; what it gives is an estimate of the number
of blocks that will be available from *somewhere* in memory (i.e. in
shared_buffers OR OS cache) for one particular table (the one currently
being considered by the planner).

Crucially, the "size" referred to is the size of the *estimate*, not the
size of the OS cache (nor the size of the OS cache + shared_buffers). So
setting effective_cache_size = total memory available or setting
effective_cache_size = total memory - shared_buffers are both wildly
irrelevant things to do, or any assumption that directly links memory
size to that parameter. So talking about "effective_cache_size" as if it
were the OS cache isn't the right thing to do.

...It could be that we use a very high % of physical memory as
shared_buffers - in which case the effective_cache_size would represent
the contents of shared_buffers.

Note also that the planner assumes that all tables are equally likely to
be in cache. Increasing effective_cache_size in postgresql.conf seems
destined to give the wrong answer in planning unless you absolutely
understand what it does.

I will submit a patch to correct the description in the manual.

Further comments:
The two estimates appear to use effective_cache_size differently:
a) assumes that a table of size effective_cache_size will be 50% in
cache
b) assumes that effective_cache_size blocks are available, so for a
table of size == effective_cache_size, then it will be 100% available

IMHO the GUC should be renamed "estimated_cached_blocks", with the old
name deprecated to force people to re-read the manual description of
what effective_cache_size means and then set accordingly.....all of that
in 8.0....

--
Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2004-10-26 09:50:58 Re: [PATCHES] ARC Memory Usage analysis
Previous Message Fabien COELHO 2004-10-26 06:24:04 Re: Using ALTER TABLESPACE in pg_dump

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2004-10-26 09:50:58 Re: [PATCHES] ARC Memory Usage analysis
Previous Message Dave Page 2004-10-26 07:38:43 Re: pg_autovacuum vacuum cost variables patch

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2004-10-26 09:50:58 Re: [PATCHES] ARC Memory Usage analysis
Previous Message Rod Dutton 2004-10-26 08:19:33 FW: can't handle large number of INSERT/UPDATEs