Re: Refactoring the checkpointer's fsync request queue

From: Shawn Debnath <sdn(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2019-02-16 19:39:05
Message-ID: 20190216193905.GA53174@f01898859afd.ant.amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 15, 2019 at 06:45:02PM -0800, Andres Freund wrote:

> > One of the advantages of that approach is that there are probably
> > other files that need to be fsync'd for each checkpoint that could
> > benefit from being offloaded to the checkpointer. Another is that you
> > break the strange cycle mentioned above.
>
> The other issue is that I think your approach moves the segmentation
> logic basically out of md into smgr. I think that's wrong. We shouldn't
> presume that every type of storage is going to have segmentation that's
> representable in a uniform way imo.

I had a discussion with Thomas on this and am working on a new version
of the patch that incorporates what you guys discussed at FOSDEM, but
avoiding passing pathnames to checkpointer.

The mdsync machinery will be moved out of md.c and pending ops table
will incorporate the segment number as part of the key. Still deciding
on how to cleanly re-factor _mdfd_getseg which mdsync utilizes during
the file sync operations. The ultimate goal is to get checkpointer the
file descriptor it can use to issue the fsync using FileSync. So perhaps
a function in smgr that returns just that based on the RelFileNode, fork
and segno combination. Dealing only with file descriptors will allow us
to implement passing FDs to checkpointer directly as part of the request
in the future.

The goal is to encapsulate relation specific knowledge within md.c while
allowing undo and generic block store (ex-SLRU) to do their own mapping
within the smgr layer later. Yes, checkpointer will "call back" into
smgr, but these would be to retrieve information that should be managed
by smgr. Allowing checkpointer to focus on its job of tracking requests
and syncing files via the fd interfaces.

> > Another consideration if we do that is that the existing scheme has a
> > kind of hierarchy that allows fsync requests to be cancelled in bulk
> > when you drop relations and databases. That is, the checkpointer
> > knows about the internal hierarchy of tablespace, db, rel, seg. If we
> > get rid of that and have just paths, it seems like a bad idea to teach
> > the checkpointer about the internal structure of the paths (even
> > though we know they contain the same elements encoded somehow). You'd
> > have to send an explicit cancel for every key; that is, if you're
> > dropping a relation, you need to generate a cancel message for every
> > segment, and if you're dropping a database, you need to generate a
> > cancel message for every segment of every relation.
>
> I can't see that being a problem - compared to the overhead of dropping
> a relation, that doesn't seem to be a meaningfully large cost?

With the scheme above - dropping hierarchies will require scanning the
hash table for matching dboid or reloid and removing those entries. We
do this today for FORGET_DATABASE_FSYNC in RememberFsyncRequest. The
matching function will belong in smgr. We can see how scanning the whole
hash table impacts performance and iterate on it from there if needed.
--
Shawn Debnath
Amazon Web Services (AWS)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-02-16 20:31:05 Re: 2019-03 CF Summary / Review - Tranche #2
Previous Message Alexander Korotkov 2019-02-16 19:22:32 Re: 2019-03 CF Summary / Review - Tranche #2