Quick Links

Adding facility for injection points (or probe points?) for more advanced tests

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Adding facility for injection points (or probe points?) for more advanced tests
Date:	2023-10-25 04:13:38
Message-ID:	ZTiV8tn_MIb_H2rE@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi all,

I don't remember how many times in the last few years when I've had to
hack the backend to produce a test case that involves a weird race
condition across multiple processes running in the backend, to be able
to prove a point or just test a fix (one recent case: 2b8e5273e949).
Usually, I come to hardcoding stuff for the following situations:
- Trigger a PANIC, to force recovery.
- A FATAL, to take down a session, or just an ERROR.
- palloc() failure injection.
- Sleep to slow down a code path.
- Pause and release with condition variable.

And, while that's helpful to prove a point on a thread, nothing comes
out of it in terms of regression test coverage in the tree because
these tests are usually too slow and expensive, as they usually rely
on hardcoded timeouts. So that's pretty much attempting to emulate
what one would do with a debugger in a predictable way, without the
manual steps because human hands don't scale well.

The reason behind that is of course more advanced testing, to be able
to expand coverage when we have weird and non-deterministic race
issues to deal with, and the code becoming more complex every year
makes that even harder. Fault and failure injection in specific paths
comes into mind, additionally, particularly if you manage complex
projects based on Postgres.

So, please find attached a patch set that introduces an in-core
facility to be able to set what I'm calling here an "injection point",
that consists of being able to register in shared memory a callback
that can be run within a defined location of the code. It means that
it is not possible to trigger a callback before shared memory is set,
but I've faced far more the case where I wanted to trigger something
after shmem is set anyway. Persisting an injection point across
restarts is also possible by adding some through an extension's shmem
hook, as we do for custom LWLocks for example, as long as a library is
loaded.

This will remind a bit of what Alexander Korotkov has proposed here:
https://www.postgresql.org/message-id/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com
Also, this is much closee to what Craig Ringer is mentioning here,
where it is named probe points, but I am using a minimal design that
allows to achieve the same:
https://www.postgresql.org/message-id/CAPpHfdsn-hzneYNbX4qcY5rnwr-BA1ogOCZ4TQCKQAw9qa48kA%40mail.gmail.com

A difference is that I don't really see a point in passing to the
callback triggered an area of data coming from the hash table itself,
as at the end a callback could just refer to an area in shared memory
or a static set of variables depending on what it wants, with one or
more injection points (say a location to set a state, and a second to
check it). So, at the end, the problem comes down in my opinion to
two things:
- Possibility to trigger a condition defined by some custom code, in
the backend (core code or even out-of-core).
- Possibility to define a location in the code where a named point
would be checked.

0001 introduces three APIs to create, run, and drop injection points:
+extern void InjectionPointCreate(const char *name,
+ InjectionPointCallback callback);
+extern void InjectionPointRun(const char *name);
+extern void InjectionPointDrop(const char *name);

Then one just needs to add a macro like that to trigger the callback
registered in the code to test:
INJECTION_POINT_RUN("String");
So the footprint in the core tree is not zero, but it is as minimal as
it can be.

I have added some documentation to explain that, as well. I am not
wedded to the name proposed in the patch, so if you feel there is
better, feel free to propose ideas.

This facility is hidden behind a specific configure/Meson switch,
making it a no-op by default:
--enable-injection-points
-Dinjection_points={ true | false }

0002 is a test module to test these routines, that I have kept a
maximum simple to ease review of the basics proposed here. This could
be extended further to propose more default modes with TAP tests on
its own, as I don't see a real point in having the SQL bits or some
common callbacks (like for the PANIC or the FATAL cases) in core.

Thoughts and comments are welcome.
--
Michael

Attachment	Content-Type	Size
0001-Base-facility-for-injection-points.patch	text/x-diff	17.2 KB
0002-Add-test-module-for-injection-points.patch	text/x-diff	13.5 KB

Responses

Re: Adding facility for injection points (or probe points?) for more advanced tests at 2023-10-25 04:36:17 from Amul Sul
Re: Adding facility for injection points (or probe points?) for more advanced tests at 2023-11-06 19:28:14 from Nazir Bilal Yavuz
Re: Adding facility for injection points (or probe points?) for more advanced tests at 2023-11-11 02:32:27 from Andres Freund
Re: Adding facility for injection points (or probe points?) for more advanced tests at 2023-11-20 11:23:45 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amul Sul	2023-10-25 04:36:17	Re: Adding facility for injection points (or probe points?) for more advanced tests
Previous Message	John Naylor	2023-10-25 04:08:59	Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound