Re: BufferAlloc: don't take two simultaneous locks

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, michail(dot)nikolaev(at)gmail(dot)com, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-05-10 22:50:08
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

В Пт, 06/05/2022 в 10:26 -0400, Robert Haas пишет:
> On Thu, Apr 21, 2022 at 6:58 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> > At the master state:
> > - SharedBufHash is not declared as HASH_FIXED_SIZE
> > - get_hash_entry falls back to element_alloc too fast (just if it doesn't
> > found free entry in current freelist partition).
> > - get_hash_entry has races.
> > - if there are small number of spare items (and NUM_BUFFER_PARTITIONS is
> > small number) and HASH_FIXED_SIZE is set, it becomes contended and
> > therefore slow.
> >
> > HASH_REUSE solves (for shared buffers) most of this issues. Free list
> > became rare fallback, so HASH_FIXED_SIZE for SharedBufHash doesn't lead
> > to performance hit. And with fair number of spare items, get_hash_entry
> > will find free entry despite its races.
> Hmm, I see. The idea of trying to arrange to reuse entries rather than
> pushing them onto a freelist and immediately trying to take them off
> again is an interesting one, and I kind of like it. But I can't
> imagine that anyone would commit this patch the way you have it. It's
> way too much action at a distance. If any ereport(ERROR,...) could
> happen between the HASH_REUSE operation and the subsequent HASH_ENTER,
> it would be disastrous, and those things are separated by multiple
> levels of call stack across different modules, so mistakes would be
> easy to make. If this could be made into something dynahash takes care
> of internally without requiring extensive cooperation with the calling
> code, I think it would very possibly be accepted.
> One approach would be to have a hash_replace() call that takes two
> const void * arguments, one to delete and one to insert. Then maybe
> you propagate that idea upward and have, similarly, a BufTableReplace
> operation that uses that, and then the bufmgr code calls
> BufTableReplace instead of BufTableDelete. Maybe there are other
> better ideas out there...


While HASH_REUSE is a good addition to overall performance improvement
of the patch, it is not required for major gain.

Major gain is from not taking two partition locks simultaneously.

hash_replace would require two locks, so it is not an option.




In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2022-05-10 23:24:15 Re: Query generates infinite loop
Previous Message Tom Lane 2022-05-10 22:46:26 Re: configure openldap crash warning