| From: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Cc: | Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
| Subject: | Re: Buffer locking is special (hints, checksums, AIO writes) |
| Date: | 2026-03-31 16:02:33 |
| Message-ID: | 5bf667f3-5270-4b19-a08f-0facbecdff68@postgrespro.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
27.03.2026 23:00, Andres Freund wrote:
> Hi,
>
> On 2026-03-25 18:35:55 -0400, Andres Freund wrote:
>> Running it through valgrind and then will work on reading through one more
>> time and pushing them.
>
> And done.
>
> Phew, this project took way longer than I'd though it'd take.
In addition to bug with BM_IO_ERROR [1] , I found race condition in
PinBuffer in this lines of code:
if (unlikely(skip_if_not_valid && !(old_buf_state & BM_VALID)))
return false;
/*
* We're not allowed to increase the refcount while the buffer
* header spinlock is held. Wait for the lock to be released.
*/
if (old_buf_state & BM_LOCKED)
old_buf_state = WaitBufHdrUnlocked(buf);
While we waited for buffer header for being unlocked, it may become
invalid, isn't it?
Therefore, check related to skip_if_not_valid have to happen after waiting.
....
Another question: previously we had to wait for buffer for being unlocked
because UnlockBufHdr wrote to buf->state unconditionally, therefore our pin
increment could be lost.
Now UnlockBufHdr and UnlockBufHdrExt does proper atomic operations and
preserves concurrent changes. Are we still need to wait?
Most of time PinBuffer is called protected by BufTable's partition LWLock,
therefore buffer may not be changed in dramatic way.
But call in ReadRecentBuffer is the exception. It is not protected by
partition lock and have to make additional checks. That is why you
introduced skip_if_not_valid.
Does optimization of ReadRecentBuffer pays for WaitBufHdrUnlocked?
[1]
https://www.postgresql.org/message-id/57a8ea70-32a8-4596-bb68-5d2990b83380%40postgrespro.ru
--
regards
Yura Sokolov aka funny-falcon
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-03-31 16:02:51 | AIO / read stream heuristics adjustments for index prefetching |
| Previous Message | Junwang Zhao | 2026-03-31 15:54:29 | Re: Eliminating SPI / SQL from some RI triggers - take 3 |