Re: stress test for parallel workers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-07-23 23:48:57
Message-ID: CA+hUKGKZS32Lx1zViq0W8omCjHjBsN=dJZ8nX-rkhv1W=b802Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 24, 2019 at 10:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > *I suspect that the only thing implicating parallelism in this failure
> > is that parallel leaders happen to print out that message if the
> > postmaster dies while they are waiting for workers; most other places
> > (probably every other backend in your cluster) just quietly exit.
> > That tells us something about what's happening, but on its own doesn't
> > tell us that parallelism plays an important role in the failure mode.
>
> I agree that there's little evidence implicating parallelism directly.
> The reason I'm suspicious about a possible OOM kill is that parallel
> queries would appear to the OOM killer to be eating more resources
> than the same workload non-parallel, so that we might be at more
> hazard of getting OOM'd just because of that.
>
> A different theory is that there's some hard-to-hit bug in the
> postmaster's processing of parallel workers that doesn't apply to
> regular backends. I've looked for one in a desultory way but not
> really focused on it.
>
> In any case, the evidence from the buildfarm is pretty clear that
> there is *some* connection. We've seen a lot of recent failures
> involving "postmaster exited during a parallel transaction", while
> the number of postmaster failures not involving that is epsilon.

I don't have access to the build farm history in searchable format
(I'll go and ask for that). Do you have an example to hand? Is this
failure always happening on Linux?

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-07-23 23:57:34 Re: stress test for parallel workers
Previous Message Thomas Munro 2019-07-23 23:32:30 Re: stress test for parallel workers