Re: about fsync in CLOG buffer write

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: 张广舟(明虚) <guangzhou(dot)zgz(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, 周正中(德歌) <dege(dot)zzz(at)alibaba-inc(dot)com>, 范孝剑(康贤) <funnyxj(dot)fxj(at)alibaba-inc(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, 窦贤明(执白) <xianming(dot)dxm(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>, 陈新坚(惧留孙) <xinjian(dot)chen(at)alibaba-inc(dot)com>
Subject: Re: about fsync in CLOG buffer write
Date: 2015-10-04 19:25:35
Message-ID: 20151004192535.GA22389@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-10-04 12:14:05 -0700, Jeff Janes wrote:
> My (naive) expectation is that no additional locking is needed.
>
> Once we decide to consult the clog, we already know the transaction is no
> longer in progress, so it can't be in-flight to change that clog entry we
> care about because it was required to have done that already.

Other xids on the same page can still be in progress and those
concurrently might need to be written to.

> Once we have verified (under existing locking) that the relevant page is
> already not in memory, we know it can't be dirty in memory. If someone
> pulls it into memory after we observe it to be not there, it doesn't matter
> to us as whatever transaction they are about to change can't be the one we
> care about.

The read of the page from disk from a concurrent process might have been
before our write, i.e. containing an unmodified page, but now future
writes will overwrite the entry we wrote directly. I think there's a
bunch of related issues.

Such things will currently prevented by the IO locks in slru.c.

> Is there a chance that, if we read a byte from the kernel when someone is
> in the process of writing adjacent bytes (or writing the same byte, with
> changes only to bits in it which we don't care about), the kernel will
> deliver us something which is neither the old value nor the new value, but
> some monstrosity?

Depends on the granularity of the write/read and the OS IIRC.

I don't think it's worth investing time and complexity to bypass SLRU in
certain cases. We should rather rewrite the thing completely.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-10-04 19:41:40 Re: Less than ideal error reporting in pg_stat_statements
Previous Message Jeff Janes 2015-10-04 19:14:05 Re: about fsync in CLOG buffer write