From: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Online enabling of checksums |
Date: | 2018-04-06 23:04:50 |
Message-ID: | 8A3A67FB-0C69-4736-8E52-D866EBC53C56@yesql.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 07 Apr 2018, at 00:23, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
>> Applying this makes the _cancel test pass, moving the failure instead to the
>> following _enable test (which matches what coypu and mylodon are seeing).
>
> FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
> get the buildfarm green again.
Sorry about that, I’m a bit slow due to various $family situations at the
moment.
> I'm fairly certain that the bug here is a simple race condition in the
> test (not the main code!):
I wonder if it may perhaps be a case of both?
> It's
> exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> make sure that a process has finished exiting. Then followup tests fail
> because the process is still running
I can reproduce the error when building with --disable-atomics, and it seems
that all the failing members either do that, lack atomic.h, lack atomics or a
combination. checksumhelper cancellation use pg_atomic_unlocked_test_flag() to
test if it’s running when asked to abort, something which seems unsafe to do in
semaphore simulation as it always returns true. If I for debugging synthesize
a flag test with testset/clear, the tests pass green (with the upstream patch
for pg_atomic_test_set_flag_impl() applied). Cancelling with semaphore sim is
thus doomed to never work IIUC. Or it’s a red herring.
As Magnus mentioned upstream, rewriting to not use an atomic flag is probably
the best option, once the current failure is understood.
> really? Let's just force the test take at least 6s purely from
> sleeping?
The test needs continuous reading in a session to try and trigger any bugs in
read access on the cluster during checksumming, is there a good way to do that
in the isolationtester? I have failed to find a good way to repeat a step like
that, but I might be missing something.
cheers ./daniel
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-04-06 23:13:56 | Re: Online enabling of checksums |
Previous Message | Robert Haas | 2018-04-06 23:04:01 | Re: PostgreSQL 11 Release Management Team & Feature Freeze |