Re: stress test for parallel workers

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, mark(at)2ndquadrant(dot)com
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-10-10 22:01:14
Message-ID: 84813a30-8c34-9c32-7ad5-90d9eefba468@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10/10/19 5:34 PM, Tom Lane wrote:
> I wrote:
>>>> Yeah, I've been wondering whether pg_ctl could fork off a subprocess
>>>> that would fork the postmaster, wait for the postmaster to exit, and then
>>>> report the exit status.
>> [ pushed at 6a5084eed ]
>> Given wobbegong's recent failure rate, I don't think we'll have to wait
>> long.
> Indeed, we didn't:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wobbegong&dt=2019-10-10%2020%3A54%3A46
>
> The tail end of the system log looks like
>
> 2019-10-10 21:00:33.717 UTC [15127:306] pg_regress/date FATAL: postmaster exited during a parallel transaction
> 2019-10-10 21:00:33.717 UTC [15127:307] pg_regress/date LOG: disconnection: session time: 0:00:02.896 user=fedora database=regression host=[local]
> /bin/sh: line 1: 14168 Segmentation fault (core dumped) "/home/fedora/build-farm-10-clang/buildroot/HEAD/pgsql.build/tmp_install/home/fedora/build-farm-clang/buildroot/HEAD/inst/bin/postgres" -F -c listen_addresses="" -k "/tmp/pg_upgrade_check-ZrhQ4h"
> postmaster exit status is 139
>
> So that's definitive proof that the postmaster is suffering a SIGSEGV.
> Unfortunately, we weren't blessed with a stack trace, even though
> wobbegong is running a buildfarm client version that is new enough
> to try to collect one. However, seeing that wobbegong is running
> a pretty-recent Fedora release, the odds are that systemd-coredump
> has commandeered the core dump and squirreled it someplace where
> we can't find it.

At least on F29 I have set /proc/sys/kernel/core_pattern and it works.

>
> Much as one could wish otherwise, systemd doesn't seem likely to
> either go away or scale back its invasiveness; so I'm afraid we
> are probably going to need to teach the buildfarm client how to
> negotiate with systemd-coredump at some point. I don't much want
> to do that right this minute, though.
>
> A nearer-term solution would be to reproduce this manually and
> dig into the core. Mark, are you in a position to give somebody
> ssh access to wobbegong's host, or another similarly-configured VM?

I have given Mark my SSH key. That doesn't mean others interested shouldn't.

>
> (While at it, it'd be nice to investigate the infinite_recurse
> failures we've been seeing on all those ppc64 critters ...)
>
>

Yeah.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Lewis 2019-10-10 22:19:36 Re: BRIN index which is much faster never chosen by planner
Previous Message Jeremy Finzel 2019-10-10 21:58:11 BRIN index which is much faster never chosen by planner