Re: BufferAlloc: don't take two simultaneous locks

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, michail(dot)nikolaev(at)gmail(dot)com, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-05-06 14:26:08
Message-ID: CA+Tgmoa=Wxkdn03dtw50etUwdqP1-dJVzb1wvwwvX8UDk4BNYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 21, 2022 at 6:58 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> At the master state:
> - SharedBufHash is not declared as HASH_FIXED_SIZE
> - get_hash_entry falls back to element_alloc too fast (just if it doesn't
> found free entry in current freelist partition).
> - get_hash_entry has races.
> - if there are small number of spare items (and NUM_BUFFER_PARTITIONS is
> small number) and HASH_FIXED_SIZE is set, it becomes contended and
> therefore slow.
>
> HASH_REUSE solves (for shared buffers) most of this issues. Free list
> became rare fallback, so HASH_FIXED_SIZE for SharedBufHash doesn't lead
> to performance hit. And with fair number of spare items, get_hash_entry
> will find free entry despite its races.

Hmm, I see. The idea of trying to arrange to reuse entries rather than
pushing them onto a freelist and immediately trying to take them off
again is an interesting one, and I kind of like it. But I can't
imagine that anyone would commit this patch the way you have it. It's
way too much action at a distance. If any ereport(ERROR,...) could
happen between the HASH_REUSE operation and the subsequent HASH_ENTER,
it would be disastrous, and those things are separated by multiple
levels of call stack across different modules, so mistakes would be
easy to make. If this could be made into something dynahash takes care
of internally without requiring extensive cooperation with the calling
code, I think it would very possibly be accepted.

One approach would be to have a hash_replace() call that takes two
const void * arguments, one to delete and one to insert. Then maybe
you propagate that idea upward and have, similarly, a BufTableReplace
operation that uses that, and then the bufmgr code calls
BufTableReplace instead of BufTableDelete. Maybe there are other
better ideas out there...

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-05-06 14:28:53 Re: libpq async duplicate error results
Previous Message Robert Haas 2022-05-06 14:10:10 Re: make MaxBackends available in _PG_init