Re: Refactoring the checkpointer's fsync request queue

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, sdn(at)amazon(dot)com, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2018-11-13 18:43:30
Message-ID: CA+TgmoYVEAUxNGwdsBJ8BXh5UwnsnixuDvZ_tugunkgF-AG+NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 13, 2018 at 1:07 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-11-13 12:04:23 -0500, Robert Haas wrote:
> > I still feel like this whole pass-the-fds-to-the-checkpointer thing is
> > a bit of a fool's errand, though. I mean, there's no guarantee that
> > the first FD that gets passed to the checkpointer is the first one
> > opened, or even the first one written, is there?
> I'm not sure I understand the danger you're seeing here. It doesn't have
> to be the first fd opened, it has to be an fd that's older than all the
> writes that we need to ensure made it to disk. And that ought to be
> guaranteed by the logic? Between the FileWrite() and the
> register_dirty_segment() (and other relevant paths) the FD cannot be
> closed.

Suppose backend A and backend B open a segment around the same time.
Is it possible that backend A does a write before backend B, but
backend B's copy of the fd reaches the checkpointer before backend A's
copy? If you send the FD to the checkpointer before writing anything
then I think it's fine, but if you write first and then send the FD to
the checkpointer I don't see what guarantees the ordering.

> > It seems like if you wanted to make this work reliably, you'd need to
> > do it the other way around: have the checkpointer (or some other
> > background process) open all the FDs, and anybody else who wants to
> > have one open get it from the checkpointer.
>
> That'd require a process context switch for each FD opened, which seems
> clearly like a no-go?

I don't know how bad that would be. But hey, no cost is too great to
pay as a workaround for insane kernel semantics, right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-11-13 18:56:53 Re: Sync ECPG scanner with core
Previous Message Tom Lane 2018-11-13 18:21:29 Re: Sync ECPG scanner with core