Re: Refactoring the checkpointer's fsync request queue

From: Shawn Debnath <sdn(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2019-02-27 21:26:54
Message-ID: 20190227212653.GA7171@f01898859afd.ant.amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 22, 2019 at 03:57:42PM -0800, Andres Freund wrote:
> > Today, all the callbacks for smgr have their prototypes defined in
> > smgr.h and used in smgr.c. Forward declarations within the new sync.h
> > would be fine but having md callbacks split in two different places may
> > not be the cleanest approach? One could argue they serve different
> > purposes so perhaps it correct that we define them separately. I am
> > fine with either, but now I probably prefer the new enum to fixed
> > function declaration mappings that Andres suggested. I agree it would be
> > less error prone.
>
> I'd really just entirely separate the two. I'd not include any knowledge
> of this mechanism into smgr.h, and just make the expansion of paths
> dispatch statically dispatched (potentially into md.c) to actually do
> the work. I'm not sure why sync.h would need any forward declarations
> for that?

We had a quick offline discussion to get on the same page and we agreed
to move forward with Andres' approach above. Attached is patch v10.
Here's the overview of the patch:

1. Move the system for requesting and processing fsyncs out of md.c
into storage/sync/sync.c with definitions in include/storage/sync.h.
ProcessSyncRequests() is now responsible for processing the sync
requests during checkpoint.

2. Removed the need for specific storage managers to implement pre and
post checkpoint callbacks. These are now executed by the sync mechanism.

3. We now embed the fork number and the segment number as part of the
hash key for the pending ops table. This eliminates the bitmapset based
segment tracking for each relfilenode during fsync as not all storage
managers may map their segments from zero.

4. Each sync request now must include a type: sync, forget, forget
hierarchy, or unlink, and the owner who will be responsible for
generating paths or matching forget requests.

5. For cancelling relation sync requests, we now must send a forget
request for each fork and segment in the relation.

6. We do not rely on smgr to provide the file descriptor we use to
issue fsync. Instead, we generate the full path based on the FileTag
in the sync request and use PathNameOpenFile to get the file descriptor.

Ran make check-world and repeated the tests described in [1]. The
numbers show a 12% drop in total time for single run of 1000 clients and
~62% drop in total time for 10 parallel runs with 200 clients:

[Requests Absorbed]

Min Max Average Median Std Dev
--------------- -------- -------- ----------- --------- ----------
patch-1x1000 17554 147410 85252.95 83835.5 21898.88
master-1x1000 25728 138422 81455.04 80601 21295.83

Min Max Average Median Std Dev
patch-10x200 125922 279682 197098.76 197055 34038.25
master-10x200 191833 602512 416533.86 424946 82014.48

[Files Synced]

Min Max Average Median Std Dev
--------------- ------ ------ --------- -------- ---------
patch-1x1000 155 213 158.93 159 2.97
master-1x1000 154 166 158.29 159 10.29

Min Max Average Median Std Dev
patch-10x200 1456 1577 1546.84 1547 11.23
master-10x200 1546 1546 1546 1559 12.79

[Total Time in ProcessFsyncRequest/mdsync]

Min Max Average Median Std Dev
--------------- ------ --------- ---------- -------- ---------
patch-1x1000 606 4022 2145.11 2123 440.72
master-1x1000 806 4430.32 2458.77 2382 497.01

Min Max Average Median Std Dev
patch-10x200 2472 6960 4156.8 4141 817.56
master-10x200 4323 17858 10982.15 11154 2760.47

[1]
https://www.postgresql.org/message-id/20190220232739.GA8280%40f01898859afd.ant.amazon.com

--
Shawn Debnath
Amazon Web Services (AWS)

Attachment Content-Type Size
0001-Refactor-the-fsync-machinery-to-support-future-SMGR-v10.patch text/plain 72.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-02-27 21:27:04 Re: Why don't we have a small reserved OID range for patch revisions?
Previous Message Tom Lane 2019-02-27 20:47:03 Re: POC: converting Lists into arrays