Re: BUG #17949: Adding an index introduces serialisation anomalies.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Artem Anisimov <artem(dot)anisimov(dot)255(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.
Date: 2023-07-16 22:04:29
Message-ID: CA+hUKGKOqpuHx_tx7qpbTX4o49YnCFrnB2uE3B+PUy03bBTPBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Jul 15, 2023 at 1:05 AM Artem Anisimov
<artem(dot)anisimov(dot)255(at)gmail(dot)com> wrote:
> thank you for the fixes. I've looked up the patches in pg's git repo,
> and they got me wondering: where is the repo with pg tests? I'd be
> really uneasy to make changes to concurrency-related code without a
> decent testsuite to verify them.

Generally, the tests for SSI are in:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/test/isolation/specs

... and see also ../expected. Typically they are created as features
are developed, but we'll add new tests to cover complicated bugfixes
if we can see how to do it. There are also non-SSI related tests in
there because the "isolation" infrastructure turned out to be so
useful.

For the problems discovered in this thread, I couldn't see how to do
it. These required unlucky scheduling to go wrong -- whereas the
existing test infrastructure is based on deterministic behaviour with
wait points at the statement level. It has been suggested before that
we could perhaps have a way to insert test-harness-controlled
waitpoints. But even if we had such infrastructure, the relevant wait
points are actually gone after the fixes (ie the window where you have
to do something in another thread to cause problems has been closed so
there is no candidate wait point left). Such infrastructure might
have been useful for demonstrating the bugs deterministically while
the windows existed. One of the basic techniques we often use when
trying to understand what is going on in such cases is to insert
sleeps into interesting places to widen windows and make failures
"almost" deterministic, as I did for one of the cases here.

I suppose we could in theory have a suite of 'high load' tests of a
more statistical nature that could include things like the repro you
sent in. It would burn a whole bunch of CPU trying to break
complicated concurrency stuff in ways that have been known to be
broken in the past. I'm not sure it's worth it though. Sometimes
it's OK for tests to be temporarily useful, too...

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2023-07-17 00:49:05 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Alexander Lakhin 2023-07-16 20:00:01 Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used