Re: stress test for parallel workers

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-08-07 14:30:51
Message-ID: af515ace-4956-e720-5c6e-0c3743723dcf@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/08/2019 16:57, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>> On 07/08/2019 02:57, Thomas Munro wrote:
>>> On Wed, Jul 24, 2019 at 5:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> So I think I've got to take back the assertion that we've got
>>>> some lurking generic problem. This pattern looks way more
>>>> like a platform-specific issue. Overaggressive OOM killer
>>>> would fit the facts on vulpes/wobbegong, perhaps, though
>>>> it's odd that it only happens on HEAD runs.
>
>>> chipmunk also:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2019-08-06%2014:16:16
>
>> FWIW, I looked at the logs in /var/log/* on chipmunk, and found no
>> evidence of OOM killings. I can see nothing unusual in the OS logs
>> around the time of that failure.
>
> Oh, that is very useful info, thanks. That seems to mean that we
> should be suspecting a segfault, assertion failure, etc inside
> the postmaster. I don't see any TRAP message in chipmunk's log,
> so assertion failure seems to be ruled out, but other sorts of
> process-crashing errors would fit the facts.
>
> A stack trace from the crash would be mighty useful info along
> about here. I wonder whether chipmunk has the infrastructure
> needed to create such a thing. From memory, the buildfarm requires
> gdb for that, but not sure if there are additional requirements.

It does have gdb installed.

> Also, if you're using systemd or something else that thinks it
> ought to interfere with where cores get dropped, that could be
> a problem.

I think they should just go to a file called "core", I don't think I've
changed any settings related to it, at least. I tried "find / -name
core*", but didn't find any core files, though.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-07 14:45:25 Re: stress test for parallel workers
Previous Message Tom Lane 2019-08-07 14:17:25 Re: Regression test failure in regression test temp.sql