Quick Links

Re: stress test for parallel workers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	mark(at)2ndquadrant(dot)com
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: stress test for parallel workers
Date:	2019-10-10 21:34:51
Message-ID:	5350.1570743291@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
>>> Yeah, I've been wondering whether pg_ctl could fork off a subprocess
>>> that would fork the postmaster, wait for the postmaster to exit, and then
>>> report the exit status.

> [ pushed at 6a5084eed ]
> Given wobbegong's recent failure rate, I don't think we'll have to wait
> long.

Indeed, we didn't:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wobbegong&dt=2019-10-10%2020%3A54%3A46

The tail end of the system log looks like

2019-10-10 21:00:33.717 UTC [15127:306] pg_regress/date FATAL: postmaster exited during a parallel transaction
2019-10-10 21:00:33.717 UTC [15127:307] pg_regress/date LOG: disconnection: session time: 0:00:02.896 user=fedora database=regression host=[local]
/bin/sh: line 1: 14168 Segmentation fault (core dumped) "/home/fedora/build-farm-10-clang/buildroot/HEAD/pgsql.build/tmp_install/home/fedora/build-farm-clang/buildroot/HEAD/inst/bin/postgres" -F -c listen_addresses="" -k "/tmp/pg_upgrade_check-ZrhQ4h"
postmaster exit status is 139

So that's definitive proof that the postmaster is suffering a SIGSEGV.
Unfortunately, we weren't blessed with a stack trace, even though
wobbegong is running a buildfarm client version that is new enough
to try to collect one. However, seeing that wobbegong is running
a pretty-recent Fedora release, the odds are that systemd-coredump
has commandeered the core dump and squirreled it someplace where
we can't find it.

Much as one could wish otherwise, systemd doesn't seem likely to
either go away or scale back its invasiveness; so I'm afraid we
are probably going to need to teach the buildfarm client how to
negotiate with systemd-coredump at some point. I don't much want
to do that right this minute, though.

A nearer-term solution would be to reproduce this manually and
dig into the core. Mark, are you in a position to give somebody
ssh access to wobbegong's host, or another similarly-configured VM?

(While at it, it'd be nice to investigate the infinite_recurse
failures we've been seeing on all those ppc64 critters ...)

regards, tom lane

In response to

Re: stress test for parallel workers at 2019-10-07 04:07:48 from Tom Lane

Responses

Re: stress test for parallel workers at 2019-10-10 21:53:51 from Mark Wong
Re: stress test for parallel workers at 2019-10-10 22:01:14 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Wong	2019-10-10 21:53:51	Re: stress test for parallel workers
Previous Message	Tomas Vondra	2019-10-10 20:40:22	Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12