From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Julien Rouhaud <rjuju123(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Subject: | Re: Online checksums verification in the backend |
Date: | 2020-11-02 01:39:40 |
Message-ID: | 20201102013940.tk5e2co2maedzksw@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I'm a bit limited writing - one handed for a while following an injury
on Friday...
On 2020-11-02 10:05:25 +0900, Michael Paquier wrote:
> On Thu, Oct 29, 2020 at 10:08:52PM -0700, Andres Freund wrote:
> > I think its pretty much *never* OK to do IO while holding a buffer
> > mapping lock. You're locking a significant fraction of shared buffers
> > over IO. That's just not OK. Don't think there's any place doing so
> > currently either.
>
> There is no place doing that on HEAD.
Err?
/* see if the block is in the buffer pool or not */
LWLockAcquire(partLock, LW_SHARED);
buf_id = BufTableLookup(&buf_tag, buf_hash);
if (buf_id >= 0)
{
...
/* Read the buffer from disk, with the I/O lock still held */
smgrread(smgr, forknum, blkno, buffer);
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
}
else
{
/*
* Simply read the buffer. There's no risk of modification on it as
* we are holding the buffer pool partition mapping lock.
*/
smgrread(smgr, forknum, blkno, buffer);
}
/* buffer lookup done, so now do its check */
LWLockRelease(partLock);
How is this not doing IO while holding a buffer mapping lock?
> This specific point was mentioned in the first message of this thread,
> 7th paragraph. That was a long thread, so it is easy to miss:
> https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
The code clearly doesnt implement it that way.
> I am wondering what you have in mind regarding the use of
> BM_IO_IN_PROGRESS or a similar flag. Wouldn't that imply some
> consequences for other existing buffers in the table, like a possible
> eviction?
You'd need exactly one empty buffer for that - it can be reused for the
next to-be-checked buffer.
> I'd like to think that we should not do any manipulation of
> the buffer tables in this case.
Why? Its the way we lock buffers - why is this so special that we need
to do differently?
> Hence, in order to prevent a
> concurrent activity to load in shared buffers the page currently
> checked on disk, I got to think that we would need something new here,
> like a filtering hash table that would be checked each time a backend
> tries to insert an entry into the buffer tables.
Thats going to slow down everything a bit - the mapping already is a
bottleneck.
> 1) If the buffer is in shared buffers, we have the APIs to solve that
> by using a pin, unlock the partition, and then do the I/O. (Still
> that's unsafe with the smgrwrite() argument?)
Thats why you need an appropriate relation lock... Something CheckBuffer
didnt bother to document. Its a restriction, but one we probably can
live with.
> 2) If the buffer is not in shared buffers, we don't have what it takes
> to solve the problem yet.
We do. Set up enough state for the case to be otherwise the same as the
in s_b case.
> But even if we solve this problem, we will
> never really be sure that this is entirely safe, as per the argument
> with concurrent smgrwrite() calls. Current in-core code assumes that
> this can happen only for init forks of unlogged relations which would
> not be visible yet in the backend doing a page check, still it can be
> really easy to break this assumption with any new code added by a new
> feature.
It also happens in a few other cases than just init forks. But
visibility & relation locking can take care of that. But you need to
document that. If the locking allows concurent readers - and especially
concurrent writers, then you cant really use smgrwite for anything but
relation extension.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-11-02 01:45:00 | Re: Online checksums verification in the backend |
Previous Message | Kyotaro Horiguchi | 2020-11-02 01:36:10 | Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c) |