Re: stress test for parallel workers

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, mark(at)2ndquadrant(dot)com
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-10-11 15:12:28
Message-ID: 06d6be5a-7af2-ef78-0abc-6bfe500ead87@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10/10/19 6:01 PM, Andrew Dunstan wrote:
> On 10/10/19 5:34 PM, Tom Lane wrote:
>> I wrote:
>>>>> Yeah, I've been wondering whether pg_ctl could fork off a subprocess
>>>>> that would fork the postmaster, wait for the postmaster to exit, and then
>>>>> report the exit status.
>>> [ pushed at 6a5084eed ]
>>> Given wobbegong's recent failure rate, I don't think we'll have to wait
>>> long.
>> Indeed, we didn't:
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wobbegong&dt=2019-10-10%2020%3A54%3A46
>>
>> The tail end of the system log looks like
>>
>> 2019-10-10 21:00:33.717 UTC [15127:306] pg_regress/date FATAL: postmaster exited during a parallel transaction
>> 2019-10-10 21:00:33.717 UTC [15127:307] pg_regress/date LOG: disconnection: session time: 0:00:02.896 user=fedora database=regression host=[local]
>> /bin/sh: line 1: 14168 Segmentation fault (core dumped) "/home/fedora/build-farm-10-clang/buildroot/HEAD/pgsql.build/tmp_install/home/fedora/build-farm-clang/buildroot/HEAD/inst/bin/postgres" -F -c listen_addresses="" -k "/tmp/pg_upgrade_check-ZrhQ4h"
>> postmaster exit status is 139
>>
>> So that's definitive proof that the postmaster is suffering a SIGSEGV.
>> Unfortunately, we weren't blessed with a stack trace, even though
>> wobbegong is running a buildfarm client version that is new enough
>> to try to collect one. However, seeing that wobbegong is running
>> a pretty-recent Fedora release, the odds are that systemd-coredump
>> has commandeered the core dump and squirreled it someplace where
>> we can't find it.
>
>
> At least on F29 I have set /proc/sys/kernel/core_pattern and it works.

I have done the same on this machine. wobbegong runs every hour, so
let's see what happens next. With any luck the buildfarm will give us a
stack trace without needing further action.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2019-10-11 15:28:15 Re: PostgreSQL, C-Extension, calling other Functions
Previous Message Tom Lane 2019-10-11 14:48:37 Re: v12.0: ERROR: could not find pathkey item to sort