Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query

From: Sergei Kornilov <sk(at)zsrv(dot)org>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Maksim Milyutin <milyutinma(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query
Date: 2019-01-24 21:46:04
Message-ID: 4543511548366364@myt6-fe24916a5562.qloud-c.yandex.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi

> I think the root cause is earlier and in a different process (see
> ProcessInterrupt() in the stack). Presumably one that reported
> "dsa_area could not attach to segment" is closer to the point where
> things go wrong.

I agree.
Log was:
2019-01-23 20:31:33.508 UTC 2593 @ from [vxid:121/1810816 txid:0] [] ERROR: dsa_area could not attach to segment
2019-01-23 20:31:33.508 UTC 2593 @ from [vxid:121/1810816 txid:0] [] STATEMENT: SELECT ...
2019-01-23 20:31:33.510 UTC 22941 @ from [vxid: txid:0] [] LOG: worker process: parallel worker for PID 31224 (PID 2593) exited with exit code 1
2019-01-23 20:31:33.511 UTC 31224 user(at)db from 127.0.0.1 [vxid:20/25544096 txid:0] [SELECT] ERROR: dsa_area could not attach to segment
2019-01-23 20:31:33.511 UTC 31224 user(at)db from 127.0.0.1 [vxid:20/25544096 txid:0] [SELECT] STATEMENT: SELECT ...
2019-01-23 20:31:33.511 UTC 2595 @ from [vxid:138/277023 txid:0] [] FATAL: terminating connection due to administrator command
2019-01-23 20:31:33.511 UTC 2595 @ from [vxid:138/277023 txid:0] [] STATEMENT: SELECT ...
2019-01-23 20:31:33.511 UTC 2594 @ from [vxid:136/536348 txid:0] [] ERROR: dsa_area could not attach to segment
2019-01-23 20:31:33.511 UTC 2594 @ from [vxid:136/536348 txid:0] [] STATEMENT: SELECT ...
2019-01-23 20:31:33.512 UTC 2595 @ from [vxid:138/0 txid:0] [] FATAL: cannot unpin a segment that is not pinned
2019-01-23 20:31:33.513 UTC 22941 @ from [vxid: txid:0] [] LOG: worker process: parallel worker for PID 31224 (PID 2594) exited with exit code 1

22941 postmaster, 31224 - parallel leader, 2595 - parallel worker (backtrace was from this process), 2593 and 2594 seems another parallel workers for this query.

But I was surprised by possible double call dsm_backend_shutdown. Can first process (2593 in my case) be terminated with some reason (and therefore call shmem_exit) but be logged only with second error?

> If you are in a position to reproduce this

Unfortunally no, i can not run this database with modified source. And i still can not reproduce this bug by hand. Modified source was runned on my local host few days long. Without some luck...

regards, Sergei

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2019-01-25 10:21:21 Re: BUG #15607: Automatic Database Backup shows status Failed
Previous Message Patrick Headley 2019-01-24 20:28:25 Re: How duplicate values inserted into the primary key column of table and how to fix it