Re: Weird test mixup

From: Noah Misch <noah(at)leadboat(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weird test mixup
Date: 2024-05-01 23:12:14
Message-ID: 20240501231214.40@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While writing an injection point test, I encountered a variant of the race
condition that f4083c4 fixed. It had three sessions and this sequence of
events:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s3: detach POINT, deleting the InjectionPointCondition record
s2: wake up and run POINT as though it had been non-local

On Sat, Mar 16, 2024 at 08:40:21AM +0900, Michael Paquier wrote:
> On Fri, Mar 15, 2024 at 11:23:31AM +0200, Heikki Linnakangas wrote:
> > Wrt. the spinlock and shared memory handling, I think this would be simpler
> > if you could pass some payload in the InjectionPointAttach() call, which
> > would be passed back to the callback function:
> >
> > In this case, the payload would be the "slot index" in shared memory.
> >
> > Or perhaps always allocate, say, 1024 bytes of working area for every
> > attached injection point that the test module can use any way it wants. Like
> > for storing extra conditions, or for the wakeup counter stuff in
> > injection_wait(). A fixed size working area is a little crude, but would be
> > very handy in practice.

That would be one good way to solve it. (Storing a slot index has the same
race condition, but it fixes the race to store a struct containing the PID.)

The best alternative I see is to keep an InjectionPointCondition forever after
creating it. Give it a "bool valid" field that we set on detach. I don't see
a major reason to prefer one of these over the other. One puts a negligible
amount of memory pressure on the main segment, but it simplifies the module
code. I lean toward the "1024 bytes of working area" idea. Other ideas or
opinions?

Separately, injection_points_cleanup() breaks the rules by calling
InjectionPointDetach() while holding a spinlock. The latter has an
elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock. I haven't
given as much thought to solutions for this one.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-05-01 23:22:24 Re: [PATCH] json_lex_string: don't overread on bad UTF8
Previous Message Dmitry Koval 2024-05-01 19:51:24 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands