Re: Protect syscache from bloating with negative cache entries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2019-02-12 17:33:46
Message-ID: d3b291ff-d993-78d1-8d28-61bcf72793d6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2/12/19 12:35 PM, Kyotaro HORIGUCHI wrote:
> Thank you for testing and the commits, Tomas.
>
> At Sat, 9 Feb 2019 19:09:59 +0100, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote in <74386116-0bc5-84f2-e614-0cff19aca2de(at)2ndquadrant(dot)com>
>> On 2/7/19 1:18 PM, Kyotaro HORIGUCHI wrote:
>>> At Thu, 07 Feb 2019 15:24:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190207(dot)152418(dot)139132570(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> I've done a bunch of benchmarks on v13, and I don't see any serious
>> regression either. Each test creates a number of tables (100, 1k, 10k,
>> 100k and 1M) and then runs SELECT queries on them. The tables are
>> accessed randomly - with either uniform or exponential distribution. For
>> each combination there are 5 runs, 60 seconds each (see the attached
>> shell scripts, it should be pretty obvious).
>>
>> I've done the tests on two different machines - small one (i5 with 8GB
>> of RAM) and large one (e5-2620v4 with 64GB RAM), but the behavior is
>> almost exactly the same (with the exception of 1M tables, which does not
>> fit into RAM on the smaller one).
>>
>> On the xeon, the results (throughput compared to master) look like this:
>>
>>
>> uniform 100 1000 10000 100000 1000000
>> ------------------------------------------------------------
>> v13 105.04% 100.28% 102.96% 102.11% 101.54%
>> v13 (nodata) 97.05% 98.30% 97.42% 96.60% 107.55%
>>
>>
>> exponential 100 1000 10000 100000 1000000
>> ------------------------------------------------------------
>> v13 100.04% 103.48% 101.70% 98.56% 103.20%
>> v13 (nodata) 97.12% 98.43% 98.86% 98.48% 104.94%
>>
>> The "nodata" case means the tables were empty (so no files created),
>> while in the other case each table contained 1 row.
>>
>> Per the results it's mostly break even, and in some cases there is
>> actually a measurable improvement.
>
> Great! I guess it comes from reduced size of hash?
>

Not sure about that. I haven't actually verified that it reduces the
cache size at all - I was measuring the overhead of the extra work. And
I don't think the syscache actually shrunk significantly, because the
throughput was quite high (~15-30k tps, IIRC) so pretty much everything
was touched within the default 600 seconds.

>> That being said, the question is whether the patch actually reduces
>> memory usage in a useful way - that's not something this benchmark
>> validates. I plan to modify the tests to make pgbench script
>> time-dependent (i.e. to pick a subset of tables depending on time).
>
> Thank you.
>
>> A couple of things I've happened to notice during a quick review:
>>
>> 1) The sgml docs in 0002 talk about "syscache_memory_target" and
>> "syscache_prune_min_age", but those options were renamed to just
>> "cache_memory_target" and "cache_prune_min_age".
>
> I'm at a loss how call syscache for users. I think it is "catalog
> cache". The most basic component is called catcache, which is
> covered by the syscache layer, both of then are not revealed to
> users, and it is shown to user as "catalog cache".
>
> "catalog_cache_prune_min_age", "catalog_cache_memory_target", (if
> exists) "catalog_cache_entry_limit" and
> "catalog_cache_prune_ratio" make sense?
>

I think "catalog_cache" sounds about right, although my point was simply
that there's a discrepancy between sgml docs and code.

>> 2) "cache_entry_limit" is not mentioned in sgml docs at all, and it's
>> defined three times in guc.c for some reason.
>
> It is just PoC, added to show how it looks. (The multiple
> instances must bex a result of a convulsion of my fingers..) I
> think this is not useful unless it can be specfied per-relation
> or per-cache basis. I'll remove the GUC and add reloptions for
> the purpose. (But it won't work for pg_class and pg_attribute
> for now).
>

OK, although I'd just keep it as simple as possible. TBH I can't really
imagine users tuning limits for individual caches in any meaningful way.

>> 3) I don't see why to define PRUNE_BY_AGE and PRUNE_BY_NUMBER, instead
>> of just using two bool variables prune_by_age and prune_by_number doing
>> the same thing.
>
> Agreed. It's a kind of memory-stingy, which is useless there.
>
>> 4) I'm not entirely sure about using stmtStartTimestamp. Doesn't that
>> pretty much mean long-running statements will set the lastaccess to very
>> old timestamp? Also, it means that long-running statements (like a PL
>> function accessing a bunch of tables) won't do any eviction at all, no?
>> AFAICS we'll set the timestamp only once, at the very beginning.
>>
>> I wonder whether using some other timestamp source (like a timestamp
>> updated regularly from a timer, or something like that).
>
> I didin't consider planning that happen within a function. If
> 5min is the default for catalog_cache_prune_min_age, 10% of it
> (30s) seems enough and gettieofday() with such intervals wouldn't
> affect forground jobs. I'd choose catalog_c_p_m_age/10 rather
> than fixed value 30s and 1s as the minimal.
>

Actually, I see CatCacheCleanupOldEntries contains this comment:

/*
* Calculate the duration from the time of the last access to the
* "current" time. Since catcacheclock is not advanced within a
* transaction, the entries that are accessed within the current
* transaction won't be pruned.
*/

which I think is pretty much what I've been saying ... But the question
is whether we need to do something about it.

> I obeserved significant degradation by setting up timer at every
> statement start. The patch is doing the followings to get rid of
> the degradation.
>
> (1) Every statement updates the catcache timestamp as currently
> does. (SetCatCacheClock)
>
> (2) The timestamp is also updated periodically using timer
> separately from (1). The timer starts if not yet at the time
> of (1). (SetCatCacheClock, UpdateCatCacheClock)
>
> (3) Statement end and transaction end don't stop the timer, to
> avoid overhead of setting up a timer. (
>
> (4) But it stops by error. I choosed not to change the thing in
> PostgresMain that it kills all timers on error.
>
> (5) Also changing the GUC catalog_cache_prune_min_age kills the
> timer, in order to reflect the change quickly especially when
> it is shortened.
>

Interesting. What was the frequency of the timer / how often was it
executed? Can you share the code somehow?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2019-02-12 17:47:32 Re: [PATCH] xlogreader: do not read a file block twice
Previous Message Jeff Janes 2019-02-12 16:58:08 Re: Bloom index cost model seems to be wrong