Re: intermittent failures in Cygwin from select_parallel tests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: intermittent failures in Cygwin from select_parallel tests
Date: 2017-06-15 15:05:40
Message-ID: CA+TgmoYaqJQKtvvbATFzsTsWVZkoB-rff16Ts4osn0fCzVe=CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 15, 2017 at 10:21 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Yes, I think it is for next query. If you refer the log below from lorikeet:
>
> 2017-06-13 16:44:57.179 EDT [59404ec6.2758:63] LOG: statement:
> EXPLAIN (analyze, timing off, summary off, costs off) SELECT * FROM
> tenk1;
> 2017-06-13 16:44:57.247 EDT [59404ec9.2e78:1] ERROR: could not map
> dynamic shared memory segment
> 2017-06-13 16:44:57.248 EDT [59404dec.2d9c:5] LOG: worker process:
> parallel worker for PID 10072 (PID 11896) exited with exit code 1
> 2017-06-13 16:44:57.273 EDT [59404ec6.2758:64] LOG: statement: select
> stringu1::int2 from tenk1 where unique1 = 1;
> TRAP: FailedAssertion("!(BackgroundWorkerData->parallel_register_count
> - BackgroundWorkerData->parallel_terminate_count <= 1024)", File:
> "/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c",
> Line: 974)
> 2017-06-13 16:45:02.652 EDT [59404dec.2d9c:6] LOG: server process
> (PID 10072) was terminated by signal 6: Aborted
> 2017-06-13 16:45:02.652 EDT [59404dec.2d9c:7] DETAIL: Failed process
> was running: select stringu1::int2 from tenk1 where unique1 = 1;
> 2017-06-13 16:45:02.652 EDT [59404dec.2d9c:8] LOG: terminating any
> other active server processes
>
> Error "could not map dynamic shared memory segment" is due to query
> "EXPLAIN .. SELECT * FROM tenk1" and Assertion failure is due to
> another statement "select stringu1::int2 from tenk1 where unique1 =
> 1;".

I think you're right. So here's a theory:

1. The ERROR mapping the DSM segment is just a case of the worker the
losing a race, and isn't a bug.

2. But when that happens, parallel_terminate_count is getting bumped
twice for some reason.

3. So then the leader process fails that assertion when it tries to
launch the parallel workers for the next query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildus Kurbangaliev 2017-06-15 15:05:48 Re: Bug in ExecModifyTable function and trigger issues for foreign tables
Previous Message Tom Lane 2017-06-15 15:05:38 Re: memory fields from getrusage()