Re: BufferAlloc: don't take two simultaneous locks

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: y(dot)sokolov(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, michail(dot)nikolaev(at)gmail(dot)com, x4mmm(at)yandex-team(dot)ru, andres(at)anarazel(dot)de, simon(dot)riggs(at)enterprisedb(dot)com
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-03-14 00:39:48
Message-ID: 20220314.093948.1426408606053906565.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 11 Mar 2022 11:30:27 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> В Пт, 11/03/2022 в 15:30 +0900, Kyotaro Horiguchi пишет:
> > At Thu, 03 Mar 2022 01:35:57 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> > > В Вт, 01/03/2022 в 10:24 +0300, Yura Sokolov пишет:
> > > > Ok, here is v4.
> > >
> > > And here is v5.
> > >
> > > First, there was compilation error in Assert in dynahash.c .
> > > Excuse me for not checking before sending previous version.
> > >
> > > Second, I add third commit that reduces HASHHDR allocation
> > > size for non-partitioned dynahash:
> > > - moved freeList to last position
> > > - alloc and memset offset(HASHHDR, freeList[1]) for
> > > non-partitioned hash tables.
> > > I didn't benchmarked it, but I will be surprised if it
> > > matters much in performance sence.
> > >
> > > Third, I put all three commits into single file to not
> > > confuse commitfest application.
> >
> > Thanks! I looked into dynahash part.
> >
> > struct HASHHDR
> > {
> > - /*
> > - * The freelist can become a point of contention in high-concurrency hash
> >
> > Why did you move around the freeList?
> >
> >
> > - long nentries; /* number of entries in associated buckets */
> > + long nfree; /* number of free entries in the list */
> > + long nalloced; /* number of entries initially allocated for
> >
> > Why do we need nfree? HASH_ASSING should do the same thing with
> > HASH_REMOVE. Maybe the reason is the code tries to put the detached
> > bucket to different free list, but we can just remember the
> > freelist_idx for the detached bucket as we do for hashp. I think that
> > should largely reduce the footprint of this patch.
>
> If we keep nentries, then we need to fix nentries in both old
> "freeList" partition and new one. It is two freeList[partition]->mutex
> lock+unlock pairs.
>
> But count of free elements doesn't change, so if we change nentries
> to nfree, then no need to fix freeList[partition]->nfree counters,
> no need to lock+unlock.

Ah, okay. I missed that bucket reuse chages key in most cases.

But still I don't think its good to move entries around partition
freelists for another reason. I'm afraid that the freelists get into
imbalanced state. get_hash_entry prefers main shmem allocation than
other freelist so that could lead to freelist bloat, or worse
contension than the traditinal way involving more than two partitions.

I'll examine the possibility to resolve this...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-14 00:44:22 Re: BufferAlloc: don't take two simultaneous locks
Previous Message Tom Lane 2022-03-14 00:35:44 Re: On login trigger: take three