Re: cache control?

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: simon(at)2ndquadrant(dot)com
Cc: "'scott(dot)marlowe'" <scott(dot)marlowe(at)ihs(dot)com>, "'Michael Brusser'" <michael(at)synchronicity(dot)com>, "'Postgresql Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cache control?
Date: 2004-01-26 13:36:59
Message-ID: 401517FA.2060903@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> Jan,

[...]

> My thoughts are about multiple concurrent accesses, specifically FTS on
> large tables, rather than sequential ones.

Single or multiple backends is irrelevant here because a data block only
exists once, and therefore we have only one shared buffer cache.

>
>> Buffers evicted from T1 are remembered in B1, and because of that even
>> repeated sequential scans of the same large relation will only cycle
>> through T1 blocks, never cause any turbulence in T2 or B2.
>
> If we have a situation where a single backend makes repeated scans of
> the same table, these will be sequential and will have no effect on T1.

You really have to look at this a bit more global, not table related.
The strategy of ARC is this:

In an unknown access pattern, if a specific block is accessed less
frequently than every C requests, then it will only go into T1, age, get
evicted and the CDB moves to B1, will get removed from that and is
forgotten. Every block that is accessed more frequently than C will be
after it's last access in any of the four queues of the directory and
immediately go into T2.

The adjustment of the target T1 size is an attempt to catch as many
newcomers as possible. If an application does many inserts, it will
access new blocks very soon again, so that a small T1 is sufficient to
hold them in memory until their next access where they move into T2. An
application that does non-uniform random access to blocks (there are
always bestsellers and less frequently asked items), then a larger T1
might better satisfy that access pattern.

>
> In a DW situation, you are likely to have one or more very popular large
> tables (maybe think of this as the "Fact table", if you have a
> dimensional design). The tables are large and therefore query execution
> times will be extended (and accepted by user). In this situation it is
> very likely that: i) a single user/app submits multiple requests from
> other windows/threads
> Or simply,
> ii) multiple users access the popular table

If that causes that it's blocks are more frequently requested than every
C lookups, it belongs into T2.

>
> The common effect will be concurrent, rather than sequential, access to
> the popular table. Different SQL statements will have different plans
> and will perform scans of the same table at different rates because of
> other joins, more complex WHERE clauses etc. Like waves at a beach
> moving at different rates. Every time one scan catches up with another,
> it will cause T1 hits for almost the whole of the T1 list, promoting all
> of these blocks to the top of the T2 MRU and thus spoiling the cache -
> if it hits one it will probably hit most of them. This will not happen
> ALL the time, but I don't want it to happen EVER. Even in DW situation,
> I still want to be inserting data regularly (that's how the table got
> big!), so I have index blocks and other stuff that I want almost
> permanently in memory. Concurrent access via an index might have the
> same effect, though less dramatically.
>
> The closer the size of a table I to C, the greater the likelihood that
> these spurious cache hits will occur. (Of course, it might be argued
> that these are worthwhile and genuine cache hits - I argue that they are
> not wanted and this is the main basis of my discussion). Of course, if a
> table does fit in memory than that is very good. If a table was, say
> 2*C, then spurious cache hits will occur often and spoil the whole of
> T2.

How can any generic algorithm ever sense that when the application is
accessing the same blocks multiple times, it should NOT cache them? Are
you asking for a fine granulated tuning of cache priorities and
behaviour on a per table basis?

>
> The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test
> consists of a power test (all queries sequentially in random order) and
> a throughput test (1 or more concurrent streams, each stream executing
> all queries in a random order). When this benchmark first came out most
> vendors chose to perform the throughput test with only 1 stream (though
> with parallel processing)...I would say one reason for this is poor
> cache management...hence recent changes in various commercial products.
>
> In summary, I believe there is a reasonably common effect in DW
> situations where concurrent query access to large and popular tables
> will result in undesirable cache spoiling. This effect will still occur
> even after the ARC improvements are introduced - though in every other
> case I can think of, the ARC code is a major improvement on earlier
> strategies and should be hailed as a major improvement in automatic
> performance adaptation.
>
> There are two solution ideas:
> i) change the code so that FTS on large tables use the "no cache"
> strategy that has already been developed to support Vaccuum.
> ii) more complex: synchronise the FTS of the large table so that all
> backends that want scans produce only one set of I/Os and they share the
> block many times (yet still don't put it in cache!). FTS don't start at
> "the beginning" every time, they start wherever a current scan has got
> to, then loop back round at end (so average of two concurrent scans is
> 1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more
> detailed explanation may be required - this technique is in commercial
> use within the Teradata rdbms. Implementing it would take some doing...

How will the configuration of all that look like? You are using several
business terms a human brain can imagine to describe various access
patterns you want to be treated specially. In the whole system catalog
and all the way down to the buffer cache, we only have some file and
block number, maybe the size of it too but that's not guaranteed (think
of blind writes by a backend of another DB). So how do we express what
you want in some algorithm that we can put into the strategy?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-01-26 14:36:59 Re: Corrupted db?
Previous Message Curt Sampson 2004-01-26 13:31:16 pg_dump and CHECK constraints

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2004-01-26 16:02:27 Re: Clarify libpq docs
Previous Message Simon Riggs 2004-01-26 12:55:20 Re: cache control?