Re: Protect syscache from bloating with negative cache entries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2019-03-27 08:30:37
Message-ID: 20190327.173037.40342566.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 25 Mar 2019 09:28:57 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaViV7gFtAiivfBdBZkumvH3_Gey-4G8PF0KHncQSZ_Jw(at)mail(dot)gmail(dot)com>
> On Thu, Mar 7, 2019 at 11:40 PM Ideriha, Takeshi
> <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> wrote:
> > Just to be sure, we introduced the LRU list in this thread to find the entries less than threshold time
> > without scanning the whole hash table. If hash table becomes large without LRU list, scanning time becomes slow.
>
> Hmm. So, it's a trade-off, right? One option is to have an LRU list,
> which imposes a small overhead on every syscache or catcache operation
> to maintain the LRU ordering. The other option is to have no LRU
> list, which imposes a larger overhead every time we clean up the
> syscaches. My bias is toward thinking that the latter is better,
> because:
>
> 1. Not everybody is going to use this feature, and
>
> 2. Syscache cleanup should be something that only happens every so
> many minutes, and probably while the backend is otherwise idle,
> whereas lookups can happen many times per millisecond.
>
> However, perhaps someone will provide some evidence that casts a
> different light on the situation.

It's closer to my feeling. When cache is enlarged, all entries
are copied into new twice-in-size hash. If some entries removed,
we don't need to duplicate the whole hash, otherwise it means
that we do extra scan. We don't the pruning scan not frequently
than the interval so it is not a bad bid.

> I don't see much point in continuing to review this patch at this
> point. There's been no new version of the patch in 3 weeks, and there
> is -- in my view at least -- a rather frustrating lack of evidence
> that the complexity this patch introduces is actually beneficial. No
> matter how many people +1 the idea of making this more complicated, it
> can't be justified unless you can provide a test result showing that
> the additional complexity solves a problem that does not get solved
> without that complexity. And even then, who is going to commit a
> patch that uses a design which Tom Lane says was tried before and
> stunk?

Hmm. Anyway it is hit by recent commit. I'll post a rebased
version and a version reverted to do hole-scan. Then I'll take
numbers as far as I can and will show the result.. tomorrow.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-03-27 08:49:45 Re: txid_status() off-by-one error
Previous Message Amit Langote 2019-03-27 08:23:51 Re: Ordered Partitioned Table Scans