Quick Links

Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py

From:	Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com>
To:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py
Date:	2025-11-04 09:31:31
Message-ID:	fe34a709-1a5a-449c-8da6-3189d1ac445c@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11/3/25 10:55, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Oct 30, 2025 at 02:32:45PM +0100, Mats Kindahl wrote:
>> Hi all,
>>
>> Here is an updated set of patches based on the latest HEAD of PostgreSQL.
> Thanks for those patches and the initiative! Sorry to be late, I started
> to look at Coccinelle "for PostgreSQL" in [1] (to ensure some macros are used)
> and saw this thread.

Hi Bertrand and thank you for the comments.

> I did not look at the patches in detail, but sharing some thoughts.
>
> I agree that Coccinelle could be/is useful in relation to PostgreSQL development,
> but I think that we'd need to determine why it would useful to add the
> coccicheck.py script and the new dependencies in autoconf/meson.
>
> Coming back to your points and thinking if it could be used as a tool or integrated
> into the build system:
>
> 1) Identify and correct bugs in the source code both during development and
> review
>
> I agree that makes sense here. Having some bugs detected "automatically" would
> be great.
>
> 2) Make large-scale changes to the source tree to improve the code based on
> new insights
>
> I'm not sure we need to introduce coccicheck.py and add dependencies in
> autoconf/meson for this. The developer would need to know that he could use
> the Coccinelle and if he already knows then nothing prevents him from using it
> in his development tool box.
>
> 3) Encode and enforce APIs by ensuring that function calls are used
> correctly
>
> Same as 2) That said we could also imagine running yearly checks automatically
> too using coccicheck.py.
>
> 4) Use improved coding patterns for more efficient code
>
> Same as 3) from my point of view.
>
> 5) Allow extensions to automatically update code for later PostgreSQL
> versions
>
> Same as 2) from my point of view.
>
> So, I think that the current proposal (i.e build system integration) is a good
> fit for 1), less so for 3) and 4) and not necessarily needed for 2) and 5).
>
> The proposal will add new dependencies (as Michael stated up-thread) and introduce
> a new language (SmPL) that folks would need to be comfortable with to review
> the .cocci scripts.
>
> I don't have an answer to it but I think that the main question is: Should we
> integrate this into the build system, or just document it as an optional
> developer tool (wiki or such and provide .cocci scripts example)?

All of these are good points. My main reason for proposing this is that
I think that using Coccinelle more systematically would improve the
quality of the code and make it easier to maintain. The build system
changes and the coccicheck.py script is a proposal for how to make this
happen, but I realize there are alternative ways as well as tradeoffs to
be made (e.g., is the extra maintenance burden worth the improvement in
quality?)

I think there are three issues here:

1. Shall the semantic patches be in the source tree?
2. Shall we use the coccicheck.py script?
3. Shall we support this in the build system, that is, meson and automake?

Putting the coccicheck.py script aside for a short while, I think there
is value in keeping the semantic patches in the source tree rather than
adding them to a wiki for the following reasons:

It makes it easy for reviewers to check that the code reviewed follows
"best practices" (of course, assuming that the .cocci scripts represent
"best practices") instead of having to download and run these scripts
from another source. Having them on a separate page would reduce the
value since it makes it more difficult to use.

If they are part of the repository the .cocci scripts will be maintained
and updated to match the code in the source tree. If they are on a
separate wiki page, or a different repository, they are likely to be
version dependent and code might change so that they are not relevant
any more, which makes it more difficult to use them for reviewing and
checking code since that would also require updating them to match the
new code before applying them.

A drawback is that it is an extra maintenance burden (probably small,
but extra work nevertheless) and would require the semantic patches to
be executed regularly, by the build system or by reviewers, and check
that they do not report anything. If these checks are not done
regularly, it is possible that the scripts would become obsolete after a
while. By making it easy to run the scripts, we are more likely to run
them and discover issues in the scripts as well as in the code.

About integrating it with the build system. Note that there are three
modes: report, context, and (generate) patch. The important modes are
report and patch.

The main advantage of integrating it with the build system is is that it
is easy to run, which makes it easy to ensure that the semantic patches
do not report anything for both reviewers, developers, and build
systems. This will improve the quality of both the semantic patches and
the source code.

As you say, it does add an extra dependency, but this should optional
and only be used if spatch is installed. For parties not interested in
using spatch, hence does not have spatch installed, it will not be used
so it should not interfere with them.

Integrating it with the build system would also allow builders and
reviewers to use the same method for checking the code using report mode
without regard to what version is being used, you would run the checks
the same way for each minor release, for example.

As you say, coccicheck.py script is strictly speaking not needed to use
Coccinelle and is more intended as a convenience. The advantages of
using something like the coccicheck.py script is that:

The coccicheck.py script reads options from the semantic patches and
uses them when running spatch. If using spatch directly, you would have
to check the scripts, extract the options, and then use them when
running spatch. Not impossible, but an extra step that makes it more
inconvenient to use Coccinelle, which in turn makes it less likely to be
used and maintained.

You could automate the extraction of options from the semantic patches,
but this script provides a platform-independent method to run the checks
for reviewers, developers, and build systems. Note that different
semantic patches can use different options, so if you want to automate
this, you would need to write something like coccicheck.py anyway.

Best wishes,
Mats Kindahl

>
> [1]:https://www.postgresql.org/message-id/aQMtR/m4kW4Rkul%2B%40ip-10-97-1-34.eu-west-3.compute.internal
>
> Regards,
>

In response to

Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py at 2025-11-03 09:55:02 from Bertrand Drouvot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2025-11-04 09:35:14	Re: [PATCH v1 1/1] PostgreSQL Patch: AVX-Optimized ASCII Validation
Previous Message	John Naylor	2025-11-04 09:27:08	Re: Confine vacuum skip logic to lazy_scan_skip