Re: src/test/subscription/t/002_types.pl hanging on particular environment

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: src/test/subscription/t/002_types.pl hanging on particular environment
Date: 2017-09-18 15:14:46
Message-ID: CAA4eK1JQw=eiFGARrVVDgimbz+65MgHPS4L3hg-ujxAwWXeCjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> In this build you can see the output of the following at the end,
>> which might provide clues to the initiated. You might need to click a
>> small triangle to unfold the commands' output.
>
>> cat ./src/test/subscription/tmp_check/log/002_types_publisher.log
>> cat ./src/test/subscription/tmp_check/log/002_types_subscriber.log
>> cat ./src/test/subscription/tmp_check/log/regress_log_002_types
>
> The subscriber log includes
> 2017-09-18 08:43:08.240 UTC [15672] WARNING: out of background worker slots
> 2017-09-18 08:43:08.240 UTC [15672] HINT: You might need to increase max_worker_processes.
>
> Maybe that's harmless, but I'm suspicious that it's a smoking gun.
> I think perhaps this reflects a failed attempt to launch a worker,
> which the caller does not realize has failed to launch because of the
> lack of worker-fork-failure error recovery I bitched about months ago
> [1], leading to subscription startup waiting forever for a worker that's
> never going to report finishing.
>
> I see Amit K. just posted a patch in that area [2], haven't looked at it
> yet.
>

I have noticed while fixing the issue you are talking that this path
is also susceptible to such problems. In
WaitForReplicationWorkerAttach(), it relies on
GetBackgroundWorkerPid() to know the status of the worker which won't
give the correct status in case of fork failure. The patch I have
posted has a fix for the issue, however, this could be an entirely
different issue altogether as it appears from your next email in this
thread.

[1] - https://www.postgresql.org/message-id/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB%3D3uw%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2017-09-18 15:15:44 Re: UPDATE of partition key
Previous Message Tom Lane 2017-09-18 15:01:44 Re: src/test/subscription/t/002_types.pl hanging on particular environment