Re: fairywren failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: fairywren failures
Date: 2019-10-03 20:13:05
Message-ID: 29222.1570133585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> * It's certainly curious that the failures so far only have happended as
> part of pg_upgradeCheck, rather than the plain regression tests.

Isn't it though. We spent a long time wondering why we saw parallel
plan instability mostly in pg_upgradeCheck, too [1]. We eventually
decided that the cause of that instability was chance timing collisions
with bgwriter/checkpointer, but nobody ever really explained why
pg_upgradeCheck should be more prone to hit those windows than the plain
tests are. I feel like there's something still to be understood there.

Whether this is related, who's to say. But given your thought about
stack alignment, I'm half thinking that the crash is seen when we get a
signal (e.g. SIGUSR1 from sinval processing) at the wrong time, allowing
the stack to become unaligned, and that the still-unexplained timing
difference in pg_upgradeCheck accounts for that test being more prone to
show it.

regards, tom lane

[1] https://www.postgresql.org/message-id/20190605050037.GA33985@rfd.leadboat.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2019-10-03 20:55:18 Re: Value of Transparent Data Encryption (TDE)
Previous Message Tom Lane 2019-10-03 19:46:48 Re: consider including server_version in explain(settings)