Re: Adding facility for injection points (or probe points?) for more advanced tests

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding facility for injection points (or probe points?) for more advanced tests
Date: 2024-01-09 04:39:06
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(Compiled two separate review emails into a single one)

On Tue, Jan 02, 2024 at 03:36:12PM +0530, Ashutosh Bapat wrote:
> This discussion has not been addressed in v6. I think the interface
> needs to be documented in the order below
> INJECTION_POINT - this declares an injection point - i.e. a place in
> code where an external code can be injected (and run).
> InjectionPointAttach() - this is used to associate ("attach") external
> code to an injection point.
> InjectionPointDetach() - this is used to disassociate ("detach")
> external code from an injection point.
> [arguments about doc organization]
> Even if an INJECTION_POINT is not "declared" attach would succeed but
> doesn't do anything. I think this needs to be made clear in the
> documentation. Better if we could somehow make Attach() fail if the
> specified injection point is not "declared" using INJECTION_POINT. Of
> course we don't want to bloat the hash table with all "declared"
> injection points even if they aren't being attached to and hence not
> used.

Okay, I can see your point. I have reorganized the docs in the
following order:
- Attach
- Detach

> I think, exposing the current injection point strings as
> #defines and encouraging users to use these macros instead of string
> literals will be a good start.

Nah, I disagree with this one actually. It is easy to grep for the
macro INJECTION_POINT to be able to achieve the same research job, and
this would make the code more inconsistent when callbacks are run
within extensions which don't care about a #define in a backend

> With the current implementation it's possible to "declare" injection
> point with same name at multiple places. It's useful but is it
> intended?

Yes. I would recommend not doing that, but I don't see why there
would be a point in restricting that, either.

> /* Field sizes */
> #define INJ_NAME_MAXLEN 64
> #define INJ_LIB_MAXLEN 128
> #define INJ_FUNC_MAXLEN 128
> I think these limits should be either documented or specified in the
> error messages for users to fix their code in case of
> errors/unexpected behaviour.

Adding them to the error messages when attaching is a good idea.

> This is not an array entry anymore. I think we should rename
> InjectionPointEntry as SharedInjectionPointEntry and InjectionPointArrayEntry
> as LocalInjectionPointEntry.

Yep, fixed.

> +/* utilities to handle the local array cache */
> +static void
> +injection_point_cache_add(const char *name,
> + InjectionPointCallback callback)
> +{
> ... snip ...
> +
> + entry = (InjectionPointCacheEntry *)
> + hash_search(InjectionPointCache, name, HASH_ENTER, &found);
> +
> + if (!found)
> The function is called only when the injection point is not found in the local
> cache. So this condition will always be true. An Assert will help to make it
> clear and also prevent an unintended callback replacement.

Right, as coded that seems pointless to make the found conditional. I
think that I coded it this way when doing some earlier work with this
code, and finished with a simpler thing.

> +static bool
> +file_exists(const char *name)
> There's similar function in jit.c and dfmgr.c. Can we not reuse that code?

This has been mentioned in a different comment. Refactored as of
0001, but there is something here related to EACCES for the JIT path.
Seems weird to me that we would not fail if the JIT library cannot be
accessed when stat() fails.

> + /* Save the entry */
> + memcpy(entry_by_name->name, name, sizeof(entry_by_name->name));
> + entry_by_name->name[INJ_NAME_MAXLEN - 1] = '\0';
> + memcpy(entry_by_name->library, library, sizeof(entry_by_name->library));
> + entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0';
> + memcpy(entry_by_name->function, function, sizeof(entry_by_name->function));
> + entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0';
> Most of the code is using strncpy instead of memcpy. Why is this code different?

strncpy() is less used in the backend code. It comes to a matter of
taste, IMO.

> + injection_callback = injection_point_cache_get(name);
> + if (injection_callback == NULL)
> + {
> + char path[MAXPGPATH];
> +
> + /* Found, so just run the callback registered */
> The condition indicates that the callback was not found. Comment looks wrong.


> Consider case
> Backend 2
> InjectionPointAttach("xyz", "abc", "pqr");
> Backend 1
> Backend 2
> InjectionPointDetach("xyz");
> InjectionPointAttach("xyz", "uvw", "lmn");
> Backend 1
> IIUC, the last INJECTION_POINT would run abc.pqr instead of uvw.lmn.
> Am I correct?

Yeah, that's an intended design choice to keep the code simpler and
faster as there is no need to track the library and function names in
the local caches or implement something similar to invalidation
messages for this facility because it would impact performance anyway
in the call paths. In short, just don't do that, or use two distinct

On Thu, Jan 04, 2024 at 06:22:35PM +0530, Ashutosh Bapat wrote:
> One more comment on 0001
> InjectionPointAttach() doesn't test whether the given function exists
> in the given library. Even if InjectionPointAttach() succeeds,
> INJECTION_POINT might throw error because the function doesn't exist.
> This can be seen as an unwanted behaviour. I think
> InjectionPointAttach() should test whether the function exists and
> possibly load it as well by adding it to the local cache.

This has the disadvantage of filling the local cache but that may not
be necessary with an extra load_external_function() in the attach
path. I agree to make things safer, but I would do that when
attempting to run the callback instead. Perhaps there's an argument
for the case of somebody replacing a library on-the-fly. I don't
really buy it, but people like doing fancy things sometimes.

> --- /dev/null
> +++ b/src/test/modules/test_injection_points/expected/test_injection_points.out
> When built without injection point support, this test fails. We should
> add an alternate output file for such a build so that the behaviour
> with and without injection point support is tested. Or set up things
> such that the test is not run under make check in that directory. I
> will prefer the first option.

src/test/modules/Makefile has a safeguard for ./configure, and there's
one in test_injection_points/ for Meson. The test is not
run when the switches are not used, rather than using an alternate
output file. There was a different issue when moving the tests to
src/test/recovery/, though, where we need to make the execution of the
tests conditional on get_option('injection_points').

> +
> +SELECT test_injection_points_run('TestInjectionError'); -- error
> +ERROR: error triggered for injection point TestInjectionError
> +-- Re-load and run again.
> What's getting Re-loaded here? \c will create a new connection and
> thus a new backend. Maybe the comment should say "test in a fresh
> backend" or something of that sort?

The local cache is reloaded. Reworded.

> Looks confusing to me, we are testing the one removed as well. Am I
> missing something?
> [...]
> We aren't removing all entries TestInjectionLog2 is still there. Am I
> missing something?

Reworded all that.

> +# after recovery, the server will not start, and log PANIC: could not
> locate a valid checkpoint record
> IIUC the comment describes the behaviour with 7863ee4def65 reverted.
> But the test after this comment is written for the behaviour with
> 7863ee4def65. That's confusing. Is the intent to describe both
> behaviours in the comment?

This came from the original test case posted on the thread that
treated this bug. There's more that bugs me for this script that I
would like to polish. Let's focus on 0001 and 0002 for now..

> According to the prologue of ConditionVariableSleep(), that function
> should be called in a loop checking for the desired condition. All the
> callers that I examined follow that pattern. I think we need to follow
> that pattern here as well.
> Below comment from ConditionVariableTimedSleep() makes me think that
> the caller of ConditionVariableSleep() can be woken up even if the
> condition variable was not signaled. That's why the while() loop
> around ConditionVariableSleep().

That's the thing here, we don't have an extra condition to check
after. The variable sleep is what triggers the stop. :)
Perhaps this could be made smarter or with something else, I'm OK to
revisit that with the polishing for 0003 I'm planning. We could use a
separate shared state, for example, but that does not improve the test
readability, either.

Attached is a v7 series. What do you think? 0004 and 0005 for the
extra tests still need more discussion and much more polishing, IMO.

Attachment Content-Type Size
v7-0001-Refactor-code-to-check-file-file-existence.patch text/x-diff 5.2 KB
v7-0002-Add-backend-facility-for-injection-points.patch text/x-diff 23.1 KB
v7-0003-Add-test-module-injection_points.patch text/x-diff 13.9 KB
v7-0004-Add-regression-test-to-show-snapbuild-consistency.patch text/x-diff 4.7 KB
v7-0005-Add-basic-test-for-promotion-and-restart-race-con.patch text/x-diff 10.6 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-09 04:55:07 Re: verify predefined LWLocks have entries in wait_event_names.txt
Previous Message Alexander Lakhin 2024-01-09 04:00:01 Re: Removing unneeded self joins