Re: Refactoring the checkpointer's fsync request queue

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Shawn Debnath <sdn(at)amazon(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring the checkpointer's fsync request queue
Date: 2019-02-22 22:48:53
Message-ID: 20190222224853.ahvm3bf52zqr7k7h@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-02-23 11:42:49 +1300, Thomas Munro wrote:
> On Sat, Feb 23, 2019 at 11:15 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote:
> > > I think using callbacks is the better path forward than having md or
> > > other components issue an invalidate request for each and every segment
> > > which can get quite heavy handed for large databases.
> >
> > I'm not sure I buy this. Unlinking files isn't cheap, involves many disk
> > writes, etc. The cost of an inval request in comparison isn't
> > particularly large. Especially for relation-level (rather than database
> > level) truncation, per-segment invals will likely commonly be faster
> > than the sequential scan.
>
> Well even if you do it with individual segment cancel messages for
> relations, you still need a way to deal with whole-database drops
> (generating the cancels for every segment in every relation in the
> database would be nuts), and that means either exposing some structure
> to the requests, right? So the requests would have { request type,
> callback ID, db, opaque tag }, where request type is SYNC, CANCEL,
> CANCEL_WHOLE_DB, callback ID is used to find the function that
> converts opaque tags to paths, and db is used for handling
> CANCEL_WHOLE_DB requests where you need to scan the whole hash table.
> Right?

I'm ok with using callbacks to allow pruning for things like droping
databases. If we use callbacks, I don't see a need to explicitly include
the db in the request however? The callback can look into the opaque
tag, no? Also, why do we need a separation between request type and
callback? That seems like it'll commonly be entirely redundant?

> > > At the time of smgrinit(), mdinit() would call into sync and register
> > > it's callbacks with an ID. We can use the same value that we are using
> > > for smgr_which to map the callbacks. Each fsync request will then also
> > > accompany this ID which the sync mechanism will use to call handlers for
> > > resolving forget requests or obtaining paths for files.
> >
> > I'm not seeing a need to do this dynamically at runtime. Given that smgr
> > isn't extensible, why don't we just map callbacks (or even just some
> > switch based logic) based on some enum? Doing things at *init time has
> > more potential to go wrong, because say a preload_shared_library does
> > different things in postmaster than normal backends (in EXEC_BACKEND
> > cases).
>
> Yeah I suggested dynamic registration to avoid the problem that eg
> src/backend/storage/sync.c otherwise needs to forward declare
> md_tagtopath(), undofile_tagtopath(), slru_tagtopath(), ..., or maybe
> #include <storage/md.h> etc, which seemed like exactly the sort of
> thing up with which you would not put.

I'm not sure I understand. If we have a few known tag types, what's the
problem with including the headers with knowledge of how to implement
them into sync.c file?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-02-22 22:59:04 Re: Refactoring the checkpointer's fsync request queue
Previous Message Thomas Munro 2019-02-22 22:42:49 Re: Refactoring the checkpointer's fsync request queue