Re: BufferAlloc: don't take two simultaneous locks

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: y(dot)sokolov(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, michail(dot)nikolaev(at)gmail(dot)com, x4mmm(at)yandex-team(dot)ru, andres(at)anarazel(dot)de, simon(dot)riggs(at)enterprisedb(dot)com
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-03-16 03:07:47
Message-ID: 20220316.120747.974686338659732838.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 15 Mar 2022 13:47:17 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> В Вт, 15/03/2022 в 16:25 +0900, Kyotaro Horiguchi пишет:
> > Hmm. v8 returns stashed element with original patition index when the
> > element is *not* reused. But what I saw in the previous test runs is
> > the REUSE->ENTER(reuse)(->REMOVE) case. So the new version looks like
> > behaving the same way (or somehow even worse) with the previous
> > version.
>
> v8 doesn't differ in REMOVE case neither from master nor from
> previous version. It differs in RETURNED case only.
> Or I didn't understand what you mean :(

In v7, HASH_ENTER returns the element stored in DynaHashReuse using
the freelist_idx of the new key. v8 uses that of the old key (at the
time of HASH_REUSE). So in the case "REUSE->ENTER(elem exists and
returns the stashed)" case the stashed element is returned to its
original partition. But it is not what I mentioned.

On the other hand, once the stahsed element is reused by HASH_ENTER,
it gives the same resulting state with HASH_REMOVE->HASH_ENTER(borrow
from old partition) case. I suspect that ththat the frequent freelist
starvation comes from the latter case.

> > get_hash_entry continuously suffer lack of freelist
> > entry. (FWIW, attached are the test-output diff for both master and
> > patched)
> >
> > master finally allocated 31 fresh elements for a 100s run.
> >
> > > ALLOCED: 31 ;; freshly allocated
> >
> > v8 finally borrowed 33620 times from another freelist and 0 freshly
> > allocated (ah, this version changes that..)
> > Finally v8 results in:
> >
> > > RETURNED: 50806 ;; returned stashed elements
> > > BORROWED: 33620 ;; borrowed from another freelist
> > > REUSED: 1812664 ;; stashed
> > > ASSIGNED: 1762377 ;; reused
> > >(ALLOCED: 0) ;; freshly allocated

(I misunderstand that v8 modified get_hash_entry's preference between
allocation and borrowing.)

I re-ran the same check for v7 and it showed different result.

RETURNED: 1
ALLOCED: 15
BORROWED: 0
REUSED: 505435
ASSIGNED: 505462 (-27) ## the counters are not locked.

> Is there any measurable performance hit cause of borrowing?
> Looks like "borrowed" happened in 1.5% of time. And it is on 128kb
> shared buffers that is extremely small. (Or it was 128MB?)

It is intentional set small to get extremely frequent buffer
replacements. The point here was the patch actually can induce
frequent freelist starvation. And as you do, I also doubt the
significance of the performance hit by that. Just I was not usre.

I re-ran the same for v8 and got a result largely different from the
previous trial on the same v8.

RETURNED: 2
ALLOCED: 0
BORROWED: 435
REUSED: 495444
ASSIGNED: 495467 (-23)

Now "BORROWED" happens 0.8% of REUSED.

> Well, I think some spare entries could reduce borrowing if there is
> a need. I'll test on 128MB with spare entries. If there is profit,
> I'll return some, but will keep SharedBufHash fixed.

I don't doubt the benefit of this patch. And now convinced by myself
that the downside is negligible than the benefit.

> Master branch does less freelist manipulations since it tries to
> insert first and if there is collision it doesn't delete victim
> buffer.
>
> > > I lost access to Xeon 8354H, so returned to old Xeon X5675.
> > ...
> > > Strange thing: both master and patched version has higher
> > > peak tps at X5676 at medium connections (17 or 27 clients)
> > > than in first october version [1]. But lower tps at higher
> > > connections number (>= 191 clients).
> > > I'll try to bisect on master this unfortunate change.
> >
> > The reversing of the preference order between freshly-allocation and
> > borrow-from-another-freelist might affect.
>
> `master` changed its behaviour as well.
> It is not problem of the patch at all.

Agreed. So I think we should go on this direction.

There are some last comments on v8.

+ HASH_FIXED_SIZE);

Ah, now I understand that this prevented allocation of new elements.
I think this good to do for SharedBufHash.

====
+ long nfree; /* number of free entries in the list */
HASHELEMENT *freeList; /* chain of free elements */
} FreeListData;

+#if SIZEOF_LONG == 4
+typedef pg_atomic_uint32 nalloced_store_t;
+typedef uint32 nalloced_value_t;
+#define nalloced_read(a) (long)pg_atomic_read_u32(a)
+#define nalloced_add(a, v) pg_atomic_fetch_add_u32((a), (uint32)(v))
====

I don't think nalloced needs to be the same width to long. For the
platforms with 32-bit long, anyway the possible degradation if any by
64-bit atomic there doesn't matter. So don't we always define the
atomic as 64bit and use the pg_atomic_* functions directly?

+ case HASH_REUSE:
+ if (currBucket != NULL)

Don't we need an assertion on (DunaHashReuse.element == NULL) here?

- size = add_size(size, BufTableShmemSize(NBuffers + NUM_BUFFER_PARTITIONS));
+ /* size of lookup hash table */
+ size = add_size(size, BufTableShmemSize(NBuffers));

I was not sure that this is safe, but actually I didn't get "out of
shared memory". On second thought, I realized that when a dynahash
entry is stashed, BufferAlloc always holding a buffer block, too.
So now I'm sure that this is safe.

That's all.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-16 03:08:21 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Thomas Munro 2022-03-16 03:04:30 Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad