Re: Refactoring the checkpointer's fsync request queue

From: Shawn Debnath <sdn(at)amazon(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andres Freund" <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2019-04-03 21:44:24
Message-ID: 20190403214423.GA45392@f01898859afd.ant.amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 04, 2019 at 01:08:31AM +1300, Thomas Munro wrote:
> On Tue, Apr 2, 2019 at 11:09 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > I'm going to do some more testing and tidying tomorrow (for example I
> > think the segment.h header is silly and I'd like to remove that), and
> > commit this.

Given the dislike in the thread for introducing the concept of segments
at any layer higher than the storage manager itself, I thought it would
be better to leave the existing header files alone and introduce a new
one to separate the concept. I am fine either way we go.

> As a sanity check on the programming interface this thing gives you, I
> tried teaching the SLRUs to use the fsync queue. I finished up making
> a few small improvements, but the main thing I learned is that
> "handler" needs to be part of the hash table key. I suppose the
> discriminator could even be inside FileTag itself, but I chose to keep
> it separate and introduce a new struct to hold hander enum + FileTag
> in the hash table.

I think this is fine, but can you elaborate a bit more on why we need to
include the handler in the key for the hash table? We are de-duping
relfilenodes here and these should never collide with files from another
component. The component separation would be encoded in the RelFileNode
that the target smgr, or SLRU in this case, would be able to decipher.
Do you foresee, or know of, use cases where FileTag alone will result in
conflicts on the same file but from different handlers?

+/*
+ * Populate a file tag describing a segment file. We only use the segment
+ * number, since we can derive everything else we need by having separate
+ * sync handler functions for clog, multixact etc.
+ */
+#define INIT_SLRUFILETAG(a,xx_segno) \
+( \
+ (a).rnode.spcNode = 0, \
+ (a).rnode.dbNode = 0, \
+ (a).rnode.relNode = 0, \
+ (a).forknum = 0, \
+ (a).segno = (xx_segno) \
+)

Based on the definition of INIT_SLRUFILETAG in your patch, it seems you
are trying to only use the segno to identify the file. Not sure why we
can't use other unused fields in FileTag to identify the component? You
could for example in the current SLRU implementation have the handler
set to SYNC_HANDLER_SLRU when invoking RegisterSyncRequest() and use
relNode to distinguish between each SLRU component in a wrapper function
and call slrusyncfiletag() with the right SlruCtl.

For the SLRU to buffer cache work, I was planning on using the relNode
field to identify which specific component this tag belongs to. dbNode
would be pointing to the type of smgr (as discussed earlier in the
thread and still TBD).

I would prefer not to expand the hash key unnecessarily and given this
isn't persisted, we can expand the key in the future if needed. Keeps
the code simpler for now.

> The 0001 patch is what I'm going to commit soon. I don't know why
> Shawn measured a performance change -- a bug in an earlier version? --
> I can't see it, but I'm planning to look into that a bit more first.
> I've attached the 0002 SLRU patch for interest, but I'm not planning
> to commit that one.

Question is which block of code did you measure? I can redo the
instrumentation on the latest patch and re-validate and share the
results. I previously measured the average time it took mdsync() and
ProcessSyncRequests() in the patch to complete under similar workloads.

> I found a few more things that I thought needed adjustment:
>
> * Packing handler and request type into a uint8 is cute but a waste of
> time if we're just going to put it in a struct next to a member that
> requires word-alignment. So I changed it to a pair of plain old int16
> members. The ftag member starts at offset 4 either way, on my system.

Good catch! For posterity, using packed attribute here would be bad well
as it would result RelFileNode's spcNode in FileTag in subsequent
entries to be misaligned given our usage of the requests as an array.
This is potentially unsafe on platforms other than x86. Re-arranging the
fields would lead to the same result. Thanks for catching and fixing
this!

> * I didn't really like the use of the word HIERARCHY in the name of
> the request type, and changed it to SYNC_FILTER_REQUEST. That word
> came up because we were implementing a kind of hierarchy, where if you
> drop a database you want to forget things for all segments inside all
> relations inside that database, but the whole point of this new API is
> that it doesn't understand that, it calls a filter function to decide
> which requests to keep. So I preferred "filter" as a name for the
> type of request.

Yeah, I never liked the word hierarchy too much, but couldn't think of a
better one either. Filter is perfect.

--
Shawn Debnath
Amazon Web Services (AWS)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-04-03 21:45:14 Re: Log a sample of transactions
Previous Message Greg Stark 2019-04-03 21:41:36 Re: Implementing Incremental View Maintenance