Re: Protect syscache from bloating with negative cache entries

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protect syscache from bloating with negative cache entries
Date: 2017-01-13 22:39:04
Message-ID: CAB7nPqRgkyR9p-Lz75no+Dz=t6FeHpXA68LU6-sevKhzdvgTjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 14, 2017 at 12:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 13, 2017 at 8:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>>> Have there been ever discussions about having catcache entries in a
>>> shared memory area? This does not sound much performance-wise, I am
>>> just wondering about the concept and I cannot find references to such
>>> discussions.
>>
>> I'm sure it's been discussed. Offhand I remember the following issues:
>>
>> * A shared cache would create locking and contention overhead.
>>
>> * A shared cache would have a very hard size limit, at least if it's
>> in SysV-style shared memory (perhaps DSM would let us relax that).
>>
>> * Transactions that are doing DDL have a requirement for the catcache
>> to reflect changes that they've made locally but not yet committed,
>> so said changes mustn't be visible globally.
>>
>> You could possibly get around the third point with a local catcache that's
>> searched before the shared one, but tuning that to be performant sounds
>> like a mess. Also, I'm not sure how such a structure could cope with
>> uncommitted deletions: delete A -> remove A from local catcache, but not
>> the shared one -> search for A in local catcache -> not found -> search
>> for A in shared catcache -> found -> oops.
>
> I think the first of those concerns is the key one. If searching the
> system catalogs costs $100 and searching the private catcache costs
> $1, what's the cost of searching a hypothetical shared catcache? If
> the answer is $80, it's not worth doing. If the answer is $5, it's
> probably still not worth doing. If the answer is $1.25, then it's
> probably worth investing some energy into trying to solve the other
> problems you list. For some users, the memory cost of catcache and
> syscache entries multiplied by N backends are a very serious problem,
> so it would be nice to have some other options. But we do so many
> syscache lookups that a shared cache won't be viable unless it's
> almost as fast as a backend-private cache, or at least that's my
> hunch.

Being able to switch from one mode to another would be interesting.
Applications using extensing DDLs that require to change the catcache
with an exclusive lock would clearly pay the lock contention cost, but
do you think that be really the case of a shared lock? A bunch of
applications that I work with deploy Postgres once, then don't change
the schema except when an upgrade happens. So that would be benefitial
for that. There are even some apps that do not use pgbouncer, but drop
sessions after a timeout of inactivity to avoid a memory bloat because
of the problem of this thread. That won't solve the problem of the
local catcache bloat, but some users using few DDLs may be fine to pay
some extra concurrency cost if the session handling gets easied.

> I think it would be interested for somebody to build a prototype here
> that ignores all the problems but the first and uses some
> straightforward, relatively unoptimized locking strategy for the first
> problem. Then benchmark it. If the results show that the idea has
> legs, then we can try to figure out what a real implementation would
> look like.
> (One possible approach: use Thomas Munro's DHT stuff to build the shared cache.)

Yeah, I'd bet on a couple of days of focus to sort that out.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-13 22:43:02 Re: Patch to implement pg_current_logfile() function
Previous Message Fabien COELHO 2017-01-13 22:22:57 Re: BUG: pg_stat_statements query normalization issues with combined queries