Re: about fsync in CLOG buffer write

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: 张广舟(明虚) <guangzhou(dot)zgz(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, 周正中(德歌) <dege(dot)zzz(at)alibaba-inc(dot)com>, 范孝剑(康贤) <funnyxj(dot)fxj(at)alibaba-inc(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, 窦贤明(执白) <xianming(dot)dxm(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>, 陈新坚(惧留孙) <xinjian(dot)chen(at)alibaba-inc(dot)com>
Subject: Re: about fsync in CLOG buffer write
Date: 2015-10-04 19:14:05
Message-ID: CAMkU=1xURO+spuAMZHWc+OPfgxqvG7Ng235E2c8yP2ybA8XCdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 12, 2015 at 5:21 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On September 12, 2015 5:18:28 PM PDT, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >On Wed, Sep 2, 2015 at 5:32 AM, Andres Freund <andres(at)anarazel(dot)de>
> >wrote:
> >
> >> On 2015-09-10 19:39:59 +0800, 张广舟(明虚) wrote:
> >> > We found there is a fsync call when CLOG buffer
> >> > is written out in SlruPhysicalWritePage(). It is often called when
> >a
> >> backend
> >> > needs to check transaction status with SimpleLruReadPage().
> >>
> >> That's when there's not enough buffers available some other, and your
> >> case dirty, needs to be written out.
> >>
> >
> >Why bother to find a place to store the page in shared memory at all?
> >If
> >we just want to read it, and it isn't already in shared memory, then
> >why
> >not just ask the kernel for the specific byte we need? The byte we
> >want to
> >read can't differ between shared memory and kernel, because it doesn't
> >exist in shared memory.
>
> I doubt that'd help - the next access would be more expensive, and we'd
> need to have a more complex locking regime. These pages aren't necessarily
> read only at that point.
>

My (naive) expectation is that no additional locking is needed.

Once we decide to consult the clog, we already know the transaction is no
longer in progress, so it can't be in-flight to change that clog entry we
care about because it was required to have done that already.

Once we have verified (under existing locking) that the relevant page is
already not in memory, we know it can't be dirty in memory. If someone
pulls it into memory after we observe it to be not there, it doesn't matter
to us as whatever transaction they are about to change can't be the one we
care about.

Perhaps someone will want the same page later so that they can write to it
and so will have to pull it in. But we have to play the odds, and the odds
are that a page already dirty in memory is more likely to be needed to be
written to in the near future, than another page which was not already
dirty and is only needed with read intent.

If we are wrong, all that happens is someone later on has to do the same
work that we would have had to do anyway, at no greater cost than we if did
it now. If we are right, we avoid an fsync to make room for new page, and
then later on avoid someone else having to shove out the page we brought in
(or a different one) only to replace it with the same page we just wrote,
fsynced, and shoved out.

Is there a chance that, if we read a byte from the kernel when someone is
in the process of writing adjacent bytes (or writing the same byte, with
changes only to bits in it which we don't care about), the kernel will
deliver us something which is neither the old value nor the new value, but
some monstrosity?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-10-04 19:25:35 Re: about fsync in CLOG buffer write
Previous Message Tom Lane 2015-10-04 18:31:03 Re: DBT-3 with SF=20 got failed