Re: POC: Better infrastructure for automated testing of concurrency issues

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Better infrastructure for automated testing of concurrency issues
Date: 2020-12-04 18:57:13
Message-ID: CAH2-Wzkb+tfYuiM5=eNqfzCbTmvXo2oubJgRtAvqGpmgU8V5_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 25, 2020 at 6:11 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> While the postgres community does a great job on investigating and fixing the problems, our ability to reproduce concurrency issues in the source code test suites is limited.

+1. This seems really cool.

> For sure, evaluation of stop events causes a CPU overhead. This is why it's controlled by enable_stopevents GUC, which is off by default. I expect the overhead with enable_stopevents = off shouldn't be observable. Even if it would be observable, we could enable stop events only by specific configure parameter. There is also trace_stopevents GUC, which traces all the stop events to the log with debug2 level.

But why even risk adding noticeable overhead when "enable_stopevents =
off "? Even if it's a very small risk? We can still get most of the
benefit by enabling it only on certain builds and buildfarm animals.
It will be a bit annoying to not have stop events enabled in all
builds, but it avoids the problem of even having to think about the
overhead, now or in the future. I think that that trade-off is a good
one. Even if the performance trade-off is judged perfectly for the
first few tests you add, what are the chances that it will stay that
way as the infrastructure is used in more and more places? What if you
need to add a test to the back branches? Since we don't anticipate any
direct benefit for users (right?), I think that this question is
simple.

I am not arguing for not enabling stop events on standard builds
because the infrastructure isn't useful -- it's *very* useful. Useful
enough that it would be nice to be able to use it extensively without
really thinking about the performance hit each time. I know that I'll
be *far* more likely to use it if I don't have to waste time and
energy on that aspect every single time.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-12-04 18:58:22 Re: Improper use about DatumGetInt32
Previous Message Andres Freund 2020-12-04 18:51:57 Re: WIP: WAL prefetch (another approach)