Re: Online enabling of checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-04-06 23:13:56
Message-ID: 20180406231356.l7s6dmdbi76nc7tf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> > I'm fairly certain that the bug here is a simple race condition in the
> > test (not the main code!):
>
> I wonder if it may perhaps be a case of both?

See my other message about the atomic fallback bit.

> > It's
> > exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> > make sure that a process has finished exiting. Then followup tests fail
> > because the process is still running
>
> I can reproduce the error when building with --disable-atomics, and it seems
> that all the failing members either do that, lack atomic.h, lack atomics or a
> combination.

atomics.h isn't important, it's just relevant for solaris (IIRC). Only
one of the failing ones lacks atomics afaict. See

On 2018-04-06 14:19:09 -0700, Andres Freund wrote:
> Is that an explanation for
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-06%2019%3A18%3A11
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-04-06%2016%3A03%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2018-04-06%2015%3A46%3A16
> ? Those all don't seem fall under that? Having proper atomics?

So there it's the timing. Note that they didn't always fail either.

> > really? Let's just force the test take at least 6s purely from
> > sleeping?
>
> The test needs continuous reading in a session to try and trigger any bugs in
> read access on the cluster during checksumming, is there a good way to do that
> in the isolationtester? I have failed to find a good way to repeat a step like
> that, but I might be missing something.

IDK, I know this isn't right.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-04-06 23:24:33 pgsql: Allow insert and update tuple routing and COPY for foreign table
Previous Message Daniel Gustafsson 2018-04-06 23:04:50 Re: Online enabling of checksums