Re: [Patch] Optimize dropping of relation buffers using dlist

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-08-07 12:20:27
Message-ID: 20200807122027.n6eoupavqgb2ueuf@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 07, 2020 at 10:08:23AM +0300, Konstantin Knizhnik wrote:
>
>
>On 07.08.2020 00:33, Tomas Vondra wrote:
>>
>>Unfortunately Konstantin did not share any details about what workloads
>>he tested, what config etc. But I find the "no regression" hypothesis
>>rather hard to believe, because we're adding non-trivial amount of code
>>to a place that can be quite hot.
>
>Sorry, that I have not explained  my test scenarios.
>As far as Postgres is pgbench-oriented database:) I have also used pgbench:
>read-only case and sip-some updates.
>For this patch most critical is number of buffer allocations,
>so I used small enough database (scale=100), but shared buffer was set
>to 1Gb.
>As a result, all data is cached in memory (in file system cache), but
>there is intensive swapping at Postgres buffer manager level.
>I have tested it both with relatively small (100) and large (1000)
>number of clients.
>
>I repeated this tests at my notebook (quadcore, 16Gb RAM, SSD) and IBM
>Power2 server with about 380 virtual cores  and about 1Tb of memory.
>I the last case results are vary very much I think because of NUMA
>architecture) but I failed to find some noticeable regression of
>patched version.
>

IMO using such high numbers of clients is pointless - it's perfectly
fine to test just a single client, and the 'basic overhead' should be
visible. It might have some impact on concurrency, but I think that's
just a secondary effect I think. In fact, I wouldn't be surprised if
high client counts made it harder to observe the overhead, due to
concurrency problems (I doubt you have a laptop with this many cores).

Another thing you might try doing is using taskset to attach processes
to particular CPU cores, and also make sure there's no undesirable
influence from CPU power management etc. Laptops are very problematic in
this regard, but even servers can have that enabled in BIOS.

>
>But I have to agree that adding parallel hash (in addition to existed
>buffer manager hash) is not so good idea.
>This cache really quite frequently becomes bottleneck.
>My explanation of why I have not observed some noticeable regression
>was that this patch uses almost the same lock partitioning schema
>as already used it adds not so much new conflicts. May be in case of
>POwer2 server, overhead of NUMA is much higher than other factors
>(although shared hash is one of the main thing suffering from NUMA
>architecture).
>But in principle I agree that having two independent caches may
>decrease speed up to two times  (or even more).
>
>I hope that everybody will agree that this problem is really critical.
>It is certainly not the most common case when there are hundreds of
>relation which are frequently truncated. But having quadratic
>complexity in drop function is not acceptable from my point of view.
>And it is not only recovery-specific problem, this is why solution
>with local cache is not enough.
>

Well, ultimately it's a balancing act - we need to consider the risk of
regressions vs. how common the improved scenario is. I've seen multiple
applications that e.g. drop many relations (after all, that's why I
optimized that in 9.3) so it's not entirely bogus case.

>I do not know good solution of the problem. Just some thoughts.
>- We can somehow combine locking used for main buffer manager cache
>(by relid/blockno) and cache for relid. It will eliminates double
>locking overhead.
>- We can use something like sorted tree (like std::map) instead of
>hash - it will allow to locate blocks both by relid/blockno and by
>relid only.
>

I don't know. I think the ultimate problem here is that we're adding
code to a fairly hot codepath - it does not matter if it's hash, list,
std::map or something else I think. All of that has overhead.

That's the beauty of Andres' proposal to just loop over the blocks of
the relation and evict them one by one - that adds absolutely nothing to
BufferAlloc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-08-07 12:26:45 Re: ModifyTable overheads in generic plans
Previous Message Pavel Borisov 2020-08-07 11:59:41 [PATCH] Covering SPGiST index