Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: sk(at)zsrv(dot)org, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query
Date: 2019-01-10 21:04:18
Message-ID: CAEepm=0aPq2yEy39gEqVK2m_Qi6jJdy96ysHGJ6VSHOZFz+xbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Sergei,

Thanks for the report and backtraces. If this is a production system,
the workaround for would be to disable parallel query for the problem
queries (max_parallel_workers_per_gather = 0).

On Thu, Jan 10, 2019 at 11:14 PM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> It seems a bit similar to BUG #15290 [1], but this server was already 10.5
> and another LWLock.

Although the symptoms were a little similar, and you did mention
further down that you are using Parallel Index Scan, that bug was
fixed, and the stack looked rather different (you could see the index
scan itself was blocked).

> dsa_area could not attach to segment

This was also reported recently in 11.1[1] and seems to imply a
problem with the recycling of DSM segments slots in DSA areas, or some
other kind of concurrency problem. I am trying to reproduce it.

> Postgresql installed from PGDG repo, no external modules are installed.
> Queries uses one native partitioned table with Parallel Index Scan or
> Parallel Bitmap Heap Scan. Query itself works ok, i can not reproduce this
> issue by hand.
> Same log output "dsa_area could not attach to segment" + "cannot unpin a
> segment that is not pinned" was 1 jan.

Yeah, it does seem to be rare; Justin mentioned off-list that he did
manage to reproduce the problem he reported in the other thread by
running his query in a tight loop for 8 hours.

> Also it is expected have CHECK_FOR_INTERRUPTS in
> WaitForBackgroundWorkerShutdown with only one caller in
> DestroyParallelContext:
> /*
> * We can't finish transaction commit or abort until all of the workers
> * have exited. This means, in particular, that we can't respond to
> * interrupts at this stage.
> */
> HOLD_INTERRUPTS();
> WaitForParallelWorkersToExit(pcxt);
> RESUME_INTERRUPTS();
>
> Can parallel worker wait something response from leader?

The fact that the leader waits without interrupts here is a separate
topic, and I doubt we should change that; the real problem is that
something went wrong in a worker and I need to figure out what and fix
that.

[1] https://www.postgresql.org/message-id/flat/20181231221734.GB25379%40telsasoft.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-01-10 23:02:00 Re: BUG #15577: Query returns different results when executed multiple times
Previous Message Tom Lane 2019-01-10 19:51:33 Re: BUG #15577: Query returns different results when executed multiple times