|From:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|To:||Thomas Munro <thomas(dot)munro(at)gmail(dot)com>|
|Cc:||pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>|
|Subject:||Re: SLRUs in the main buffer pool, redux|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 25/07/2022 09:54, Heikki Linnakangas wrote:
> I'll write a separate post with my thoughts on the high-level design of
> this, ...
This patch represents each SLRU as a relation. The CLOG is one relation,
pg_subtrans is another relations, and so forth. The SLRU relations use a
different SMGR implementation, which is implemented in slru.c.
As you know, I'd like to make the SMGR implementation replaceable by
extensions. We need that for Neon, and I'd imagine it to be useful for
many other things, too, like compression, encryption, or restoring data
from a backup on-demand. I'd like all file operations to go through the
smgr API as much as possible, so that an extension can intercept SLRU
file operations too. If we introduce another internal SMGR
implementation, then an extension would need to replace both
implementations separately. I'd prefer to use the current md.c
implementation for SLRUs too, instead.
Thus I propose:
Let's represent each SLRU *segment* as a separate relation, giving each
SLRU segment a separate relNumber. Then we can use md.c for SLRUs, too.
Dropping an SLRU segment can be done by calling smgrunlink(). You won't
need to deal with missing segments in md.c, because each individual SLRU
file is a complete file, with no holes. Dropping buffers for one SLRU
segment can be done with DropRelationBuffers(), instead of introducing
the new DiscardBuffer() function. You can let md.c handle the caching of
the file descriptors, you won't need to reimplement that with
SLRUs won't need the segmentation into 1 GB segments that md.c does,
because each SLRU file is just 256 kB in size. That's OK. (BTW, I
propose that we bump the SLRU segment size up to a whopping 1 MB or even
more, while we're at it. But one step at a time.)
SLRUs also won't need the concept of relation forks. That's fine, we can
just use MAIN_FORKNUM. elated to that, I'm somewhat bothered by the way
that SMgrRelation currently bundles all the relation forks together. A
comment in smgr.h says:
> smgr.c maintains a table of SMgrRelation objects, which are essentially
> cached file handles.
But when we introduced relation forks, that got a bit muddled. Each
SMgrRelation object is now a file handle for a bunch of related relation
forks, and each fork is a separate file that can be created and
That means that an SMGR implementation, like md.c, needs to track the
file handles for each fork. I think things would be more clear if we
unbundled the forks at the SMGR level, so that we would have a separate
SMgrRelation struct for each fork. And let's rename it to SMgrFile to
make the role more clear. I think that would reduce the confusion when
we start using it for SLRUs; an SLRU is not a relation, after all. md.c
would still segment each logical file into 1 GB segments, but it would
not need to deal with forks.
Attached is a draft patch to refactor it that way, and a refactored
version of your SLRU patch over that.
The relation cache now needs to hold a separate reference to the
SMgrFile of each fork of a relation. And smgr cache invalidation still
works at relation granularity. Doing it per SmgrFile would be more clean
in smgr.c, but in practice all the forks of a relation are unlinked and
truncated together, so sending a separate invalidation event for each
SMgrFile would increase the cache invalidation traffic.
In the passing, I moved the DropRelationBuffers() calls from smgr.c to
the callers. smgr.c doesn't otherwise make any effort to keep the buffer
manager in sync with the state on-disk, that responsibility is normally
with the code that *uses* the smgr functions, so I think that's more
The first patch currently causes the '018_wal_optimize.pl' test to fail.
I guess I messed up something in the relation truncation code, but I
haven't investigated it yet. I wanted to post this to get comments on
the design, before spending more time on that.
What do you think?
|Next Message||Michael Paquier||2022-07-25 10:26:12||Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger|
|Previous Message||Dave Cramer||2022-07-25 09:57:26||Re: Proposal to provide the facility to set binary format output for specific OID's per session|