Re: Extensible storage manager API - SMGR hook Redux

From: "Tristan Partin" <tristan(at)neon(dot)tech>
To: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Matthias van de Meent" <boekewurm+postgres(at)gmail(dot)com>, "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>, "Andres Freund" <andres(at)anarazel(dot)de>
Subject: Re: Extensible storage manager API - SMGR hook Redux
Date: 2024-01-12 19:57:22
Message-ID: CYCZRFZES0OM.28GN7G5EDHOCN@neon.tech
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thought I would show off what is possible with this patchset.

Heikki, a couple of months ago in our internal Slack, said:

> [I would like] a debugging tool that checks that we're not missing any
> fsyncs. I bumped into a few missing fsync bugs with unlogged tables
> lately and a tool like that would've been very helpful.

My task was to create such a tool, and off I went. I started with the
storage manager extension patch that Matthias sent to the list last
year[0].

Andres, in another thread[1], said:

> I've been thinking that we need a validation layer for fsyncs, it's too hard
> to get right without testing, and crash testing is not likel enough to catch
> problems quickly / resource intensive.
>
> My thought was that we could keep a shared hash table of all files created /
> dirtied at the fd layer, with the filename as key and the value the current
> LSN. We'd delete files from it when they're fsynced. At checkpoints we go
> through the list and see if there's any files from before the redo that aren't
> yet fsynced. All of this in assert builds only, of course.

I took this idea and ran with it. I call it the fsync_checker™️. It is an
extension that prints relations that haven't been fsynced prior to
a CHECKPOINT. Note that this idea doesn't work in practice because
relations might not be fsynced, but they might be WAL-logged, like in
the case of createdb. See log_smgrcreate(). I can't think of an easy way
to solve this problem looking at the codebase as it stands.

Here is a description of the patches:

0001:

This is essentially just the patch that Matthias posted earlier, but
rebased and adjusted a little bit so storage managers can "inherit" from
other storage managers.

0002:

This is an extension of 0001, which allows for extensions to set
a global storage manager. This is pretty hacky, and if it was going to
be pulled into mainline, it would need some better protection. For
instance, only one extension should be able to set the global storage
manager. We wouldn't want extensions stepping over each other, etc.

0003:

Adds a hook for extensions to inspect a checkpoint before it actually
occurs. The purpose for the fsync_checker is so that I can iterate over
all the relations the extension tracks to find files that haven't been
synced prior to the completion of the checkpoint.

0004:

This is the actual fsync_checker extension itself. It must be preloaded.

Hopefully this is a good illustration of how the initial patch could be
used, even though it isn't perfect.

[0]: https://www.postgresql.org/message-id/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com
[1]: https://www.postgresql.org/message-id/20220127182838.ba3434dp2pe5vcia%40alap3.anarazel.de

--
Tristan Partin
Neon (https://neon.tech)

Attachment Content-Type Size
v1-0001-Expose-f_smgr-to-extensions-for-manual-implementa.patch text/x-patch 31.7 KB
v1-0002-Allow-extensions-to-override-the-global-storage-m.patch text/x-patch 2.9 KB
v1-0003-Add-checkpoint_create_hook.patch text/x-patch 1.9 KB
v1-0004-Add-contrib-fsync_checker.patch text/x-patch 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-01-12 20:01:59 Re: Emit fewer vacuum records by reaping removable tuples during pruning
Previous Message Robert Haas 2024-01-12 19:47:11 Re: Emit fewer vacuum records by reaping removable tuples during pruning