walmethods.c/h are doing some strange things

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: walmethods.c/h are doing some strange things
Date: 2022-09-02 15:52:38
Message-ID: CA+TgmoZS0Kw98fOoAcGz8B9iDhdqB4Be4e=vDZaJZ5A-xMYBqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We have a number of places in the system where we are using
object-oriented design patterns. For example, a foreign data wrapper
returns a table of function pointers which are basically methods for
operating on a planner or executor node that corresponds to a foreign
table that uses that foreign data wrapper. More simply, a
TupleTableSlot or TableAmRoutine or bbstreamer or bbsink object
contains a pointer to a table of callbacks which are methods that can
be applied to that object. walmethods.c/h also try to do something
sort of like this, but I find the way that they do it really weird,
because while Create{Directory|Tar}WalMethod() does return a table of
callbacks, those callbacks aren't tied to any specific object.
Instead, each set of callbacks refers to the one and only object of
that type that can ever exist, and the pointer to that object is
stored in a global variable managed by walmethods.c. So whereas in
other cases we give you the object and then a way to get the
corresponding set of callbacks, here we only give you the callbacks,
and we therefore have to impose the artificial restriction that there
can only ever be one object.

I think it would be better to structure things so that Walfile and
WalWriteMethod function as abstract base classes; that is, each is a
struct containing those members that are common to all
implementations, and then each implementation extends that struct with
whatever additional members it needs. One advantage of this is that it
would allow us to simplify the communication between receivelog.c and
walmethods.c. Right now, for example, there's a get_current_pos()
method in WalWriteMethods. The way that works is that
WalDirectoryMethod has a struct where it stores a 'curpos' value that
is returned by this method, and WalTrMethod has a different struct
that also stores a 'currpos' value that is returned by this method.
There is no real benefit in having the same variable in two different
structs and having to access it via a callback when we could just put
it into a common struct and access it directly. There's also a
compression_algorithm() method which has exactly the same issue,
though that is an overall property of the WalWriteMethod rather than a
per-Walfile property. There's also a getlasterr callback which is
basically just duplicate code across the two implementations; we could
unify that code. There's also a global variable current_walfile_name[]
in receivelog.c which only needs to exist because the file name is
inconveniently hidden inside the WalWriteMethod abstraction layer; we
can just make it visible.

Attached are a couple of hastily-written patches implementing this.
There might be good arguments for more thoroughly renaming some of the
things these patches touch, but I thought that doing any more renaming
would make it less clear what the core of the change is, so I'm
posting it like this for now. One thing I noticed while writing these
patches is that the existing code isn't very clear about whether
"Walfile" is supposed to be an abstraction for a pointer to the
implementation-specific struct, or the struct itself. From looking at
walmethods.h, you'd think it's a pointer to the struct, because we
declare typedef void *Walfile. walmethods.c agrees, but receivelog.c
takes a different view, declaring all of its variables as type
"Walfile *". This doesn't cause a compiler error because void * is
just as interchangeable with void ** as it is with DirectoryMethodFile
* or TarMethodFile *, but I think it is clearly a mistake, and the
approach I'm proposing here makes such mistakes more difficult to
make.

Aside from the stuff that I am complaining about here which is mostly
stylistic, I think that the division of labor between receivelog.c and
walmethods.c is questionable in a number of ways. There are things
which are specific to one walmethod or the other that are handled in
the common code (receivelog.c) rather than the type-specific code
(walmethod.c), and in general it feels like receivelog.c knows way too
much about what is really happening beneath the abstraction layer that
walmethods.c supposedly creates. This comment is one of the clearer
examples of this:

/*
* When streaming to files, if an existing file exists we verify that it's
* either empty (just created), or a complete WalSegSz segment (in which
* case it has been created and padded). Anything else indicates a corrupt
* file. Compressed files have no need for padding, so just ignore this
* case.
*
* When streaming to tar, no file with this name will exist before, so we
* never have to verify a size.
*/

There's nothing generic here. We're not describing an algorithm that
could be used with any walmethod that might exist now or in the
future. We're describing something that will produce the right result
given the two walmethods we actually have and the actual behavior of
the callbacks of each one. I don't really know what to do about this
part of the problem; these pieces of code are deeply intertwined in
complex ways that don't seem simple to untangle. Maybe I'll have a
better idea later, or perhaps someone else will. For now, I'd like to
get some thoughts on the attached refactoring patches that deal with
some more superficial aspects of the problem.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-walmethods.c-h-Make-Walfile-an-extensible-struct-.patch application/octet-stream 16.8 KB
v1-0002-walmethods.c-h-Make-WalWriteMethod-an-extensible-.patch application/octet-stream 53.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-09-02 15:55:51 Re: Remove dead code from sepgsql
Previous Message Bruce Momjian 2022-09-02 15:46:32 Re: [PATCH] docs: Document the automatically generated names for indices