Re: BufferAlloc: don't take two simultaneous locks

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: y(dot)sokolov(at)postgrespro(dot)ru
Cc: michail(dot)nikolaev(at)gmail(dot)com, x4mmm(at)yandex-team(dot)ru, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-02-17 05:16:47
Message-ID: 20220217.141647.512059035403445205.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 16 Feb 2022 10:40:56 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> Hello, all.
>
> I thought about patch simplification, and tested version
> without BufTable and dynahash api change at all.
>
> It performs suprisingly well. It is just a bit worse
> than v1 since there is more contention around dynahash's
> freelist, but most of improvement remains.
>
> I'll finish benchmarking and will attach graphs with
> next message. Patch is attached here.

Thanks for the new patch. The patch as a whole looks fine to me. But
some comments needs to be revised.

(existing comments)
> * To change the association of a valid buffer, we'll need to have
> * exclusive lock on both the old and new mapping partitions.
...
> * Somebody could have pinned or re-dirtied the buffer while we were
> * doing the I/O and making the new hashtable entry. If so, we can't
> * recycle this buffer; we must undo everything we've done and start
> * over with a new victim buffer.

We no longer take a lock on the new partition and have no new hash
entry (if others have not yet done) at this point.

+ * Clear out the buffer's tag and flags. We must do this to ensure that
+ * linear scans of the buffer array don't think the buffer is valid. We

The reason we can clear out the tag is it's safe to use the victim
buffer at this point. This comment needs to mention that reason.

+ *
+ * Since we are single pinner, there should no be PIN_COUNT_WAITER or
+ * IO_IN_PROGRESS (flags that were not cleared in previous code).
+ */
+ Assert((oldFlags & (BM_PIN_COUNT_WAITER | BM_IO_IN_PROGRESS)) == 0);

It seems like to be a test for potential bugs in other functions. As
the comment is saying, we are sure that no other processes are pinning
the buffer and the existing code doesn't seem to be care about that
condition. Is it really needed?

+ /*
+ * Try to make a hashtable entry for the buffer under its new tag. This
+ * could fail because while we were writing someone else allocated another

The most significant point of this patch is the reason that the victim
buffer is protected from stealing until it is set up for new tag. I
think we need an explanation about the protection here.

+ * buffer for the same block we want to read in. Note that we have not yet
+ * removed the hashtable entry for the old tag.

Since we have removed the hash table entry for the old tag at this
point, the comment got wrong.

+ * the first place. First, give up the buffer we were planning to use
+ * and put it to free lists.
..
+ StrategyFreeBuffer(buf);

This is one downside of this patch. But it seems to me that the odds
are low that many buffers are freed in a short time by this logic. By
the way it would be better if the sentence starts with "First" has a
separate comment section.

(existing comment)
| * Okay, it's finally safe to rename the buffer.

We don't "rename" the buffer here. And the safety is already
establishsed at the end of the oldPartitionLock section. So it would
be just something like "Now allocate the victim buffer for the new
tag"?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-02-17 05:56:37 Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c)
Previous Message Dilip Kumar 2022-02-17 04:49:13 Re: Per-table storage parameters for TableAM/IndexAM extensions