Re: Adding facility for injection points (or probe points?) for more advanced tests

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding facility for injection points (or probe points?) for more advanced tests
Date: 2024-01-11 04:11:44
Message-ID: ZZ9qgAO-oKvz-lC4@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 10, 2024 at 03:21:03PM +0530, Ashutosh Bapat wrote:
> On Tue, Jan 9, 2024 at 10:09 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>> +#ifdef USE_INJECTION_POINTS
>>> +static bool
>>> +file_exists(const char *name)
>>>
>>> There's similar function in jit.c and dfmgr.c. Can we not reuse that code?
>>
>> This has been mentioned in a different comment. Refactored as of
>> 0001, but there is something here related to EACCES for the JIT path.
>> Seems weird to me that we would not fail if the JIT library cannot be
>> accessed when stat() fails.
>
> I agree with this change to jit. Without having search permissions on
> every directory in the path, the function can not determine if the
> file exists or not. So throwing an error is better than just returning
> false which means that
> the file does not exist.

I was looking at the original set of threads related to JIT, and this
has been mentioned nowhere. I think that I'm going to give it a shot
and see how the buildfarm reacts. If that finishes with red, we could
always revert this part of the patch in jit.c still keep the
refactored routine.

>> Yeah, that's an intended design choice to keep the code simpler and
>> faster as there is no need to track the library and function names in
>> the local caches or implement something similar to invalidation
>> messages for this facility because it would impact performance anyway
>> in the call paths. In short, just don't do that, or use two distinct
>> points.
>
> In practice the InjectionPointDetach() and InjectionPointAttach()
> calls may not be close by and the user may not be able to figure out
> why the injection points are behaviour weird. It may impact
> performance but unexpected behaviour should be avoided, IMO.
>
> If nothing else this should be documented.

In all the infrastructures I've looked at, folks did not really care
about having an invalidation for the callbacks loaded. Still I'm OK
to add something in the documentation about that, say among the lines
of an extra sentence like:
"The callbacks loaded by a process are cached within each process.
There is no invalidation facility for the callbacks attached to
injection points, hence updating a callback for an injection point
requires a restart of the process to release its cache and the
previous callbacks attached to it."

> I am ok with not populating the cache but checking with just
> load_external_function(). This is again another ease of use scenario
> where a silly mistake by user is caught earlier making user's life
> easier. That at least should be the goal of the first cut.

I don't really aim for complicated here, just useful.

> With v6 I could run the test when built with enable_injection_point
> false. I just ran make check in that folder. Didn't test meson build.

The CI has been failing because 041_invalid_checkpoint_after_promote
was loading Time::HiRes::nanosleep and Windows does not support it.

> Yeah, I think we have to use another shared state. If the waiting
> backend moves ahead without test_injection_point_wake() being called,
> that could lead to unexpected and very weird behaviour.
>
> It looks like ConditionVariable just remembers the processes that need
> to be woken up during broadcast or signal. But by itself it doesn't
> guarantee desired condition when woken up.

Yeah, I'm not sure yet about how to do that in the most elegant way.
But this part could always happen after 0001~0003.

>> Attached is a v7 series. What do you think? 0004 and 0005 for the
>> extra tests still need more discussion and much more polishing, IMO.
>
> Generally I think the 0001 and 0002 are in good shape. However, I
> would like them to be more easy to use - like catching simple user
> errors that can be easily caught. That saves a lot of frustration
> because of unexpected behaviour. I will review 0001 and 0002 from v7
> in detail again, but it might take a few days.

Thanks again for the reviews. I still intend to focus solely on 0001,
0002 and 0003 for the current commit fest to have something able to
enforce error states in backends, at least. There have been quite a
few bugs that could have coverage thanks for that.
--
Michael

Attachment Content-Type Size
v8-0001-Refactor-code-to-check-file-file-existence.patch text/x-diff 5.2 KB
v8-0002-Add-backend-facility-for-injection-points.patch text/x-diff 23.1 KB
v8-0003-Add-test-module-injection_points.patch text/x-diff 13.9 KB
v8-0004-Add-regression-test-to-show-snapbuild-consistency.patch text/x-diff 4.7 KB
v8-0005-Add-basic-test-for-promotion-and-restart-race-con.patch text/x-diff 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-11 04:20:19 Re: Documentation to upgrade logical replication cluster
Previous Message Tom Lane 2024-01-11 04:02:00 Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows