Re: Refactoring the checkpointer's fsync request queue

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Shawn Debnath <sdn(at)amazon(dot)com>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2018-12-31 21:41:30
Message-ID: CAEepm=2qBbQXGeekhCY+W8m=fy-+eU_FAUzRCSPFKuZ2rKG1LQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 2, 2018 at 1:46 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > On Mon, Nov 26, 2018 at 11:47 PM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > > On Fri, Nov 23, 2018 at 5:45 PM Thomas Munro
> > > <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > > I do have a new plan though...
> >
> > Ugh. The plan in my previous email doesn't work, I was confused about
> > the timing of the buffer header update. Back to the drawing board.
>
> Any chance to share the drawing board with the ideas? :)
>
> On the serious note, I assume you have plans to work on this during the next
> CF, right?

Indeed I am. Unfortunately, the solution to that deadlock eludes me still.

So, I have split this work into multiple patches. 0001 is a draft
version of some new infrastructure I'd like to propose, 0002 is the
thing originally described by the first two paragraphs in the first
email in this thread, and the rest I'll have to defer for now (the fd
passing stuff).

To restate the purpose of this work: I want to make it possible for
other patches to teach the checkpointer to fsync new kinds of files
that are accessed through the buffer pool. Specifically, undo segment
files (for zheap) and SLRU files (see Shawn Debnath's plan to put clog
et al into the standard buffer pool). The main changes are:

1. A bunch of stuff moved out of md.c into smgrsync.c, where the same
pendingOpTable machinery can be shared by any block storage
implementation.
2. The actual fsync'ing now happens by going through smgrsyncimmed().
3. You can now tell the checkpointer to forget individual segments
(undo and slru both need to be able to do that when they truncate data
from the 'front').
4. The protocol for forgetting relations etc is slightly different:
if a file is found to be missing, AbsortFsyncRequests() and then probe
to see if the segment number disappeared from the set (instead of
cancel flags), though I need to test this case.
5. Requests (ie segment numbers) are now stored in a sorted vector,
because it doesn't make sense to store large and potentially sparse
integers in bitmapsets. See patch 0001 for new machinery to support
that.

The interfaces in 0001 are perhaps a bit wordy and verbose (and hard
to fit in 80 columns). Maybe I need something better for memory
contexts. Speaking of which, it wasn't possible to do a
guaranteed-no-alloc merge (like the one done for zero-anchored
bitmapset in commit 1556cb2fc), so I had to add a second vector for
'in progress' segments. I merge them with the main set on the next
attempt, if it's found to be non-empty. Very open to better ideas on
how to do any of this.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Add-parameterized-vectors-and-sorting-searching-s-v4.patch application/octet-stream 16.5 KB
0002-Refactor-the-fsync-machinery-to-support-future-SM-v4.patch application/octet-stream 78.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-12-31 21:41:33 Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Previous Message Andrew Gierth 2018-12-31 21:35:57 Re: [HACKERS] REINDEX CONCURRENTLY 2.0