Re: Protect syscache from bloating with negative cache entries

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: robertmhaas(at)gmail(dot)com, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2020-11-09 02:13:31
Message-ID: 20201109.111331.1652030822420737754.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in
> On 06/11/2020 10:24, Kyotaro Horiguchi wrote:
> > Thank you for the comment!
> > First off, I thought that I managed to eliminate the degradation
> > observed on the previous versions, but significant degradation (1.1%
> > slower) is still seen in on case.
>
> One thing to keep in mind with micro-benchmarks like this is that even
> completely unrelated code changes can change the layout of the code in
> memory, which in turn can affect CPU caching affects in surprising
> ways. If you're lucky, you can see 1-5% differences just by adding a
> function that's never called, for example, if it happens to move other
> code in memory so that a some hot codepath or struct gets split across
> CPU cache lines. It can be infuriating when benchmarking.

True. I sometimes had to make distclean to stabilize such benchmarks..

> > At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas
> > <hlinnaka(at)iki(dot)fi> wrote in
> > (A) original test patch
> > I naively thought that the code path is too short to bury the
> > degradation of additional a few instructions. Actually I measured
> > performance again with the same patch set on the current master and
> > had the more or less the same result.
> > master 8195.58ms, patched 8817.40 ms: +10.75%
> > However, I noticed that the additional call was a recursive call and a
> > jmp inserted for the recursive call seems taking significant
> > time. After avoiding the recursive call, the difference reduced to
> > +0.96% (master 8268.71ms : patched 8348.30ms)
> > Just two instructions below are inserted in this case, which looks
> > reasonable.
> > 8720ff <+31>: cmpl $0xffffffff,0x4ba942(%rip) # 0xd2ca48
> > <catalog_cache_prune_min_age>
> > 872106 <+38>: jl 0x872240 <SearchCatCache1+352> (call to a function)
>
> That's interesting. I think a 1% degradation would be acceptable.
>
> I think we'd like to enable this feature by default though, so the
> performance when it's enabled is also very important.
>
> > (C) inserting bare counter-update code without a branch
> >
> >> Do we actually need a branch there? If I understand correctly, the
> >> point is to bump up a usage counter on the catcache entry. You could
> >> increment the counter unconditionally, even if the feature is not
> >> used, and avoid the branch that way.
> > That change causes 4.9% degradation, which is worse than having a
> > branch.
> > master 8364.54ms, patched 8666.86ms (+4.9%)
> > The additional instructions follow.
> > + 8721ab <+203>: mov 0x30(%rbx),%eax # %eax = ct->naccess
> > + 8721ae <+206>: mov $0x2,%edx
> > + 8721b3 <+211>: add $0x1,%eax # %eax++
> > + 8721b6 <+214>: cmove %edx,%eax # if %eax == 0 then %eax = 2
> > <original code>
> > + 8721bf <+223>: mov %eax,0x30(%rbx) # ct->naccess = %eax
> > + 8721c2 <+226>: mov 0x4cfe9f(%rip),%rax # 0xd42068 <catcacheclock>
> > + 8721c9 <+233>: mov %rax,0x38(%rbx) # ct->lastaccess = %rax
>
> Do you need the "ntaccess == 2" test? You could always increment the
> counter, and in the code that uses ntaccess to decide what to evict,
> treat all values >= 2 the same.
>
> Need to handle integer overflow somehow. Or maybe not: integer
> overflow is so infrequent that even if a hot syscache entry gets
> evicted prematurely because its ntaccess count wrapped around to 0, it
> will happen so rarely that it won't make any difference in practice.

Agreed. Ok, I have prioritized completely avoiding degradation on the
normal path, but laxing that restriction to 1% or so makes the code
far simpler and make the expiration path signifinicantly faster.

Now the branch for counter-increment is removed. For similar
branches for counter-decrement side in CatCacheCleanupOldEntries(),
Min() is compiled into cmovbe and a branch was removed.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-11-09 02:48:42 Re: Protect syscache from bloating with negative cache entries
Previous Message David Rowley 2020-11-09 02:07:34 Re: Hybrid Hash/Nested Loop joins and caching results from subplans