Re: intermittent failures in Cygwin from select_parallel tests

From: Noah Misch <noah(at)leadboat(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: intermittent failures in Cygwin from select_parallel tests
Date: 2021-06-22 06:42:12
Message-ID: 20210622064212.GA1367859@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 22, 2021 at 05:52:03PM +1200, Thomas Munro wrote:
> On Tue, Jun 22, 2021 at 5:21 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Thu, Aug 03, 2017 at 10:45:50AM -0400, Robert Haas wrote:
> > > On Wed, Aug 2, 2017 at 11:47 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > postmaster algorithms rely on the PG_SETMASK() calls preventing that. Without
> > > > such protection, duplicate bgworkers are an understandable result. I caught
> > > > several other assertions; the PMChildFlags failure is another case of
> > > > duplicate postmaster children:
> > > >
> > > > 6 TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File: "pgstat.c", Line: 871)
> > > > 3 TRAP: FailedAssertion("!(PMSignalState->PMChildFlags[slot] == 1)", File: "pmsignal.c", Line: 229)
> > > > 20 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2523)
> > > > 21 TRAP: FailedAssertion("!(vmq->mq_sender == ((void *)0))", File: "shm_mq.c", Line: 221)
> > > > Also, got a few "select() failed in postmaster: Bad address"
> > > >
> > > > I suspect a Cygwin signals bug. I'll try to distill a self-contained test
> > > > case for the Cygwin hackers. The lack of failures on buildfarm member brolga
> > > > argues that older Cygwin is not affected.
> > >
> > > Nice detective work.
> >
> > Thanks. http://marc.info/?t=150183296400001 has my upstream report. The
> > Cygwin project lead reproduced this, but a fix remained elusive.
> >
> > I guess we'll ignore weird postmaster-associated lorikeet failures for the
> > foreseeable future.
>
> While reading a list of recent build farm assertion failures I learned that
> this is still broken in Cygwin 3.2, and eventually found my way back
> to this thread.

Interesting. Which branch(es) showed you failures? I had wondered if the
move to sa_mask (commit 9abb2bfc) would effectively end the problem in v13+.
Perhaps the Cygwin bug pokes through even that. Perhaps the sa_mask
conditionals need to be "#if defined(WIN32) && !defined(__CYGWIN__)" to help
current buildfarm members.

> I was wondering about suggesting some kind of
> official warning, but I guess the manual already covers it with this
> 10 year old notice. I don't know much about Windows or Cygwin so I'm
> not sure if it needs updating or not, but I would guess that there are
> no longer any such systems?
>
> <productname>Cygwin</productname> is not recommended for running a
> production server, and it should only be used for running on
> older versions of <productname>Windows</productname> where
> the native build does not work.

I expect native builds work on all Microsoft-supported Windows versions, so +1
for removing everything after the comma.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-06-22 06:50:27 Re: intermittent failures in Cygwin from select_parallel tests
Previous Message Michael Paquier 2021-06-22 06:34:20 Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc