Re: POC: Better infrastructure for automated testing of concurrency issues

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Better infrastructure for automated testing of concurrency issues
Date: 2020-12-07 17:31:21
Message-ID: CAPpHfdsn-hzneYNbX4qcY5rnwr-BA1ogOCZ4TQCKQAw9qa48kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Mon, Dec 7, 2020 at 9:10 AM Craig Ringer
<craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> On Wed, 25 Nov 2020 at 22:11, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> PostgreSQL is a complex multi-process system, and we are periodically faced with complicated concurrency issues. While the postgres community does a great job on investigating and fixing the problems, our ability to reproduce concurrency issues in the source code test suites is limited.
>>
>> I think we currently have two general ways to reproduce the concurrency issues.
>> 1. A text scenario for manual reproduction of the issue, which could involve psql sessions, gdb sessions etc. Couple of examples are [1] and [2]. This method provides reliable reproduction of concurrency issues. But it's hard to automate, because it requires external instrumentation (debugger) and it's not stable in terms of postgres code changes (that is particular line numbers for breakpoints could be changed). I think this is why we currently don't have such scenarios among postgres test suites.
>> 2. Another way is to reproduce the concurrency issue without directly touching the database internals using pgbench or other way to simulate the workload (see [3] for example). This way is easier to automate, because it doesn't need external instrumentation and it's not so sensitive to source code changes. But at the same time this way is not reliable and is resource-consuming.
>
> Agreed.
>
> For a useful but limited set of cases there's (3) the isolation tester and pg_isolation_regress. But IIRC the patches to teach it to support multiple upstream nodes never got in, so it's essentially useless for any replication related testing.
>
> There's also (4), write a TAP test that uses concurrent psql sessions via IPC::Run. Then play games with heavyweight or advisory lock waits to order events, use instance starts/stops, change ports or connstrings to simulate network issues, use SIGSTOP/SIGCONTs, add src/test/modules extensions that inject faults or provide custom blocking wait functions for the event you want, etc. I've done that more than I'd care to, and I don't want to do it any more than I have to in future.

Sure, there are isolation tester and TAP tests. I just meant the
scenarios, where we can't reliably reproduce using either isolation
tests or tap tests. Sorry for confusion.

> In some cases I've gone further and written tests that use systemtap in "guru" mode (read/write, with embedded C enabled) to twiddle the memory of the target process(es) when a probe is hit, e.g. to modify a function argument or return value or inject a fault. Not exactly portable or convenient, though very powerful.

Exactly, systemtap is good, but we need something more portable and
convenient for builtin test suites.

>> In the view of above, I'd like to propose a POC patch, which implements new builtin infrastructure for reproduction of concurrency issues in automated test suites. The general idea is so-called "stop events", which are special places in the code, where the execution could be stopped on some condition. Stop event also exposes a set of parameters, encapsulated into jsonb value. The condition over stop event parameters is defined using jsonpath language.
>
>
> The patched PostgreSQL used by 2ndQuadrant internally has a feature called PROBE_POINT()s that is somewhat akin to this. Since it's not a customer facing feature I'm sure I can discuss it here, though I'll need to seek permission before I can show code.
>
> TL;DR: PROBE_POINT()s let you inject ERRORs, sleeps, crashes, and various other behaviour at points in the code marked by name, using GUCs, hooks loaded from test extensions, or even systemtap scripts to control what fires and when. Performance impact is essentially zero when no probes are currently enabled at runtime, so they're fine for cassert builds.
>
> Details:
>
> A PROBE_POINT() is a macro that works as a marker, a bit like a TRACE_POSTGRESQL_.... dtrace macro. But instead of the super lightweight tracepoint that SDT marker points emit, a PROBE_POINT tests an unlikely(probe_points_enabled) flag, and if true, it prepares arguments for the probe handler: A probe name, probe action, sleep duration, and a hit counter.
>
> The default probe action and sleep duration come from GUCs. So your control of the probe is limited to the granularity you can easily manage GUCs at. That's often sufficient
>
> But if you want finer control for something, there are two ways to achieve it.
>
> After picking the default arguments to the handler, the probe point checks for a hook. If defined, it calls it with the probe point name and pointers to the action and sleep duration values, so the hook function can modify them per probe-point hit. That way you can use in src/test/modules extensions or your own test extensions first, with the probe point name as an argument and the action and sleep duration as out-params, as well as any accessible global state, custom GUCs you define in your test extension, etc. That's usually enough to target a probe very specifically but it's a bit of a hassle.
>
> Another option is to use a systemtap script. You can write your code in systemtap with its language. When the systemtap marker for a probe point event fires, decide if it's the one you want and twiddle the target process variables that store the probe action and sleep duration from the systemtap script. I find this much more convenient for day to day testing, but because of systemtap portability challenges I don't find it as useful for writing regression tests for repeat use.
>
> A PROBE_POINT() actually emits dtrace/perf SDT markers if postgres was compiled with --enable-dtrace too, so you can use them with perf, systemtap, bpftrace or whatever for read-only use. Including optional arguments to the probe point. Exactly as if it was a TRACE_POSTGRESQL_foo point, but without needing to hack probes.d for each one.
>
> The PROBE_POINT() implementation can fake signal delivery with signal actions, which has been handy too.
>
> I also have a version of the code that takes arguments to the PROBE_POINT() and passes them to the handler function as a va_list too, with a compile-time-generated array of argument types inferred by C11 _Generic as the first argument. So your handler function can be passed probe-point-specific contextual info like the current xid being committed or whatever. This isn't currently deployed.
>
> The advantage of the PROBE_POINT() approach has been that it's generally very cheap to check whether a probe point should fire, and it's basically free to skip them if there are no probe points enabled right now. If we hashed the probe point names for the initial comparisons it'd be faster still.
>
> I will seek approval to share the relevant code.

It's nice to know that we've also worked in this direction. I was a
bit surprised when I didn't find relevant patches published in the
mailing lists. I hope you would be able to share the code, it would
be very nice to see.

>> Following functions control behavior –
>> * pg_stopevent_set(stopevent_name, jsonpath_conditon) – sets condition for the stop event. Once the function is executed, all the backends, which run a given stop event with parameters satisfying the given jsonpath condition, will be stopped.
>> * pg_stopevent_reset(stopevent_name) – resets stop events. All the backends previously stopped on a given stop event will continue the execution.
>
>
> Does that offer any way to affect early startup, late shutdown, servers in warm standby, etc? Or for that matter, any way to manipulate bgworkers and auxprocs or the postmaster itself, things you can't run a query on directly?

Using the current version of patch you can manipulate bgworkers and
auxprocs as soon as they're connected to the shmem. We can write
queries from another backend and the setting affects the whole
cluster. I'm planning to add the ability to access the process
information from the jsonpath condition. So, we would be able to
choose which process to stop on the stop event.

Early startup, late shutdown, servers in warm standby are not
supported yet. I think this could be done using GUCa and hooks +
custom extensions in the similar way you describe it for
PROBE_POINT().

Also, I don't think we need to support everything at once. It would
be nice to get something simple as soon as we have a clear roadmap of
how to add the rest of the features later.

> Also, based on my experience using PROBE_POINT()s I would suggest that in addition to a stop or start "event", it's desirable to be able to elog(PANIC), elog(ERROR), elog(LOG), and/or sleep() for a certain duration. I've found all to be extremely useful.
>
>> In the code stop events are defined using macro STOPEVENT(event_id, params). The 'params' should be a function call, and it's evaluated only if stop events are enabled. pg_isolation_test_session_is_blocked() takes stop events into account.
>
>
> Oooh, that I like.
>
> PROBE_POINT()s don't do that, and it's annoying.

Thank you for your feedback. I'm looking forward if you can publish
the PROBE_POINT() work.

------
Regards,
Alexander Korotkov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-12-07 17:53:29 Change default of checkpoint_completion_target
Previous Message Amit Langote 2020-12-07 16:16:00 Re: Huge memory consumption on partitioned table with FKs