Re: Weird test mixup

From: Noah Misch <noah(at)leadboat(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weird test mixup
Date: 2024-05-02 19:35:55
Message-ID: 20240502193555.a1.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 02, 2024 at 04:27:12PM +0900, Michael Paquier wrote:
> On Wed, May 01, 2024 at 04:12:14PM -0700, Noah Misch wrote:
> > While writing an injection point test, I encountered a variant of the race
> > condition that f4083c4 fixed. It had three sessions and this sequence of
> > events:
> >
> > s1: local-attach to POINT
> > s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
> > s3: detach POINT, deleting the InjectionPointCondition record
> > s2: wake up and run POINT as though it had been non-local

I should have given a simpler example:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s1: exit
s2: wake up and run POINT as though it had been non-local

> Fun. One thing I would ask is why it makes sense to be able to detach
> a local point from a different session than the one who defined it as
> local. Shouldn't the operation of s3 be restricted rather than
> authorized as a safety measure, instead?

(That's orthogonal to the race condition.) When s1 would wait at the
injection point multiple times in one SQL statement, I like issuing the detach
from s3 so s1 waits at just the first encounter with the injection point.
This mimics setting a gdb breakpoint and deleting that breakpoint before
"continue". The alternative, waking s1 repeatedly until it finishes the SQL
statement, is less convenient. (I also patched _detach() to wake the waiter,
and I plan to propose that.)

> > Separately, injection_points_cleanup() breaks the rules by calling
> > InjectionPointDetach() while holding a spinlock. The latter has an
> > elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock. I haven't
> > given as much thought to solutions for this one.
>
> Indeed. That's a brain fade. This one could be fixed by collecting
> the point names when cleaning up the conditions and detach after
> releasing the spinlock. This opens a race condition between the
> moment when the spinlock is released and the detach, where another
> backend could come in and detach a point before the shmem_exit
> callback has the time to do its cleanup, even if detach() is
> restricted for local points. So we could do the callback cleanup in

That race condition seems fine. The test can be expected to control the
timing of backend exits vs. detach calls. Unlike the InjectionPointRun()
race, it wouldn't affect backends unrelated to the test.

> three steps in the shmem exit callback:
> - Collect the names of the points to detach, while holding the
> spinlock.
> - Do the Detach.
> - Take again the spinlock, clean up the conditions.
>
> Please see the attached.

The injection_points_cleanup() parts look good. Thanks.

> @@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS)
> {
> char *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
>
> + if (!injection_point_allowed(name))
> + elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
> + name);
> +

As above, I disagree with the injection_points_detach() part.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Cleveland 2024-05-02 20:42:05 Re: Why is FOR ORDER BY function getting called when the index is handling ordering?
Previous Message Pavel Stehule 2024-05-02 18:24:18 Re: Idea Feedback: psql \h misses -> Offers Links?