Re: BufferAlloc: don't take two simultaneous locks

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, michail(dot)nikolaev(at)gmail(dot)com, x4mmm(at)yandex-team(dot)ru, andres(at)anarazel(dot)de, simon(dot)riggs(at)enterprisedb(dot)com
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-04-14 05:58:33
Message-ID: 67e127f79577a6e0ab287a0e8bdf35b8d9419e13.camel@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

В Пт, 08/04/2022 в 16:46 +0900, Kyotaro Horiguchi пишет:
> At Thu, 07 Apr 2022 14:14:59 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> > В Чт, 07/04/2022 в 16:55 +0900, Kyotaro Horiguchi пишет:
> > > Hi, Yura.
> > >
> > > At Wed, 06 Apr 2022 16:17:28 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrot
> > > e in
> > > > Ok, I got access to stronger server, did the benchmark, found weird
> > > > things, and so here is new version :-)
> > >
> > > Thanks for the new version and benchmarking.
> > >
> > > > First I found if table size is strictly limited to NBuffers and FIXED,
> > > > then under high concurrency get_hash_entry may not find free entry
> > > > despite it must be there. It seems while process scans free lists, other
> > > > concurrent processes "moves etry around", ie one concurrent process
> > > > fetched it from one free list, other process put new entry in other
> > > > freelist, and unfortunate process missed it since it tests freelists
> > > > only once.
> > >
> > > StrategyGetBuffer believes that entries don't move across freelists
> > > and it was true before this patch.
> >
> > StrategyGetBuffer knows nothing about dynahash's freelist.
> > It knows about buffer manager's freelist, which is not partitioned.
>
> Yeah, right. I meant get_hash_entry.

But entries doesn't move.
One backends takes some entry from one freelist, other backend puts
other entry to other freelist.

> > > I don't think that causes significant performance hit, but I don't
> > > understand how it improves freelist hit ratio other than by accident.
> > > Could you have some reasoning for it?
> >
> > Since free_reused_entry returns entry into random free_list, this
> > probability is quite high. In tests, I see stabilisa
>
> Maybe. Doesn't it improve the efficiency if we prioritize emptied
> freelist on returning an element? I tried it with an atomic_u32 to
> remember empty freelist. On the uin32, each bit represents a freelist
> index. I saw it eliminated calls to element_alloc. I tried to
> remember a single freelist index in an atomic but there was a case
> where two freelists are emptied at once and that lead to element_alloc
> call.

I thought on bitmask too.
But doesn't it return contention which many freelists were against?
Well, in case there are enough entries to keep it almost always "all
set", it would be immutable.

> > > By the way the change of get_hash_entry looks something wrong.
> > >
> > > If I understand it correctly, it visits num_freelists/4 freelists at
> > > once, then tries element_alloc. If element_alloc() fails (that must
> > > happen), it only tries freeList[freelist_idx] and gives up, even
> > > though there must be an element in other 3/4 freelists.
> >
> > No. If element_alloc fails, it tries all NUM_FREELISTS again.
> > - condition: `ntries || !allocFailed`. `!allocFailed` become true,
> > so `ntries` remains.
> > - `ntries = num_freelists;` regardless of `allocFailed`.
> > Therefore, all `NUM_FREELISTS` are retried for partitioned table.
>
> Ah, okay. ntries is set to num_freelists after calling element_alloc.
> I think we (I?) need more comments.
>
> By the way, why it is num_freelists / 4 + 1?

Well, num_freelists could be 1 or 32.
If num_freelists is 1 then num_freelists / 4 == 0 - not good :-)

------

regards

Yura Sokolov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-04-14 06:27:59 Re: Assert in pageinspect with NULL pages
Previous Message Julien Rouhaud 2022-04-14 05:50:24 Re: make MaxBackends available in _PG_init