Re: Refactoring the checkpointer's fsync request queue

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Shawn Debnath <sdn(at)amazon(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2019-02-22 22:42:49
Message-ID: CA+hUKG+7q=1wR+RuNUb1TSST-SjG4sr-_yfyny2MnsQSyY7x_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 23, 2019 at 11:15 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote:
> > I think using callbacks is the better path forward than having md or
> > other components issue an invalidate request for each and every segment
> > which can get quite heavy handed for large databases.
>
> I'm not sure I buy this. Unlinking files isn't cheap, involves many disk
> writes, etc. The cost of an inval request in comparison isn't
> particularly large. Especially for relation-level (rather than database
> level) truncation, per-segment invals will likely commonly be faster
> than the sequential scan.

Well even if you do it with individual segment cancel messages for
relations, you still need a way to deal with whole-database drops
(generating the cancels for every segment in every relation in the
database would be nuts), and that means either exposing some structure
to the requests, right? So the requests would have { request type,
callback ID, db, opaque tag }, where request type is SYNC, CANCEL,
CANCEL_WHOLE_DB, callback ID is used to find the function that
converts opaque tags to paths, and db is used for handling
CANCEL_WHOLE_DB requests where you need to scan the whole hash table.
Right?

> > At the time of smgrinit(), mdinit() would call into sync and register
> > it's callbacks with an ID. We can use the same value that we are using
> > for smgr_which to map the callbacks. Each fsync request will then also
> > accompany this ID which the sync mechanism will use to call handlers for
> > resolving forget requests or obtaining paths for files.
>
> I'm not seeing a need to do this dynamically at runtime. Given that smgr
> isn't extensible, why don't we just map callbacks (or even just some
> switch based logic) based on some enum? Doing things at *init time has
> more potential to go wrong, because say a preload_shared_library does
> different things in postmaster than normal backends (in EXEC_BACKEND
> cases).

Yeah I suggested dynamic registration to avoid the problem that eg
src/backend/storage/sync.c otherwise needs to forward declare
md_tagtopath(), undofile_tagtopath(), slru_tagtopath(), ..., or maybe
#include <storage/md.h> etc, which seemed like exactly the sort of
thing up with which you would not put. (Which reminds me, smgr.{c,h}
suffers from a similar disease, as the comment says: "move me
elsewhere -- ay 7/94").

> Besides those two points, I think this is going in a good direction!

Phew. :-)

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-02-22 22:48:53 Re: Refactoring the checkpointer's fsync request queue
Previous Message Joe Conway 2019-02-22 22:22:09 Re: oddity with ALTER ROLE/USER