Re: Refactoring the checkpointer's fsync request queue

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, sdn(at)amazon(dot)com, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2018-11-13 17:04:23
Message-ID: CA+Tgmobtym=iXUV8Lb-ifnxqpfKTiJBPSWuqryutFqUpuqL7AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 11, 2018 at 9:59 PM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> There is one major problem with this patch

If there's only one, you're doing great! Although admittedly this
seems like a big one...

> 1. Go back to the current pressure-valve strategy: make the sending
> side perform the fsync(), if it can't immediately write to the pipe.

As you say, this will happen significantly more often with
deduplication. That deduplication logic got added in response to a
real need. Before that, you could cause an individual backend to
start doing its own fsyncs() with something as simple as a bulk load.
The queue would absorb most of them, but not all, and the performance
ramifications where noticeable.

> 2. Offload the BufferSync() work to bgwriter, so the checkpointer can
> keep draining the pipe. Communication between checkpointer and
> bgwriter can be fairly easily multiplexed with the pipe draining work.

That sounds a little like you are proposing to go back to the way
things were before 806a2aee3791244bf0f916729bfdb5489936e068 (and,
belatedly, bf405ba8e460051e715d0a91442b579e590328ce) although I guess
the division of labor wouldn't be quite the same.

> 3. Multiplex the checkpointer's work: Use LWLockConditionalAcquire()
> when locking buffers, and if that fails, try to drain the pipe, and
> then fall back to a LWLockTimedAcquire(), drain pipe, repeat loop. I
> can hear you groan already; that doesn't seem particularly elegant,
> and there are portability problems implementing LWLockTimedAcquire():
> semtimedop() and sem_timedwait() are not available on all platforms
> (eg macOS). Maybe pthread_timed_condwait() could help (!).

You don't really need to invent LWLockTimedAcquire(). You could just
keep retrying LWLockConditionalAcquire() in a delay loop. I agree
that doesn't seem particularly elegant, though.

I still feel like this whole pass-the-fds-to-the-checkpointer thing is
a bit of a fool's errand, though. I mean, there's no guarantee that
the first FD that gets passed to the checkpointer is the first one
opened, or even the first one written, is there? It seems like if you
wanted to make this work reliably, you'd need to do it the other way
around: have the checkpointer (or some other background process) open
all the FDs, and anybody else who wants to have one open get it from
the checkpointer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-11-13 17:11:42 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation
Previous Message Paul Ramsey 2018-11-13 16:31:00 Re: Changing SQL Inlining Behaviour (or...?)