Re: pg11.1: dsa_area could not attach to segment

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg11.1: dsa_area could not attach to segment
Date: 2019-02-14 12:12:35
Message-ID: CAEepm=22ug5qu5ZgHnbE9KUioKS7Dtv929tzCHN821gXFKaB_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 12, 2019 at 10:15 PM Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> I still have error with parallel_leader_participation = off.

Justin very kindly set up a virtual machine similar to the one where
he'd seen the problem so I could experiment with it. Eventually I
also managed to reproduce it locally, and have finally understood the
problem.

It doesn't happen on master (hence some of my initial struggle to
reproduce it) because of commit 197e4af9, which added srandom() to set
a different seed for each parallel workers. Perhaps you see where
this is going already...

The problem is that a DSM handle (ie a random number) can be reused
for a new segment immediately after the shared memory object has been
destroyed but before the DSM slot has been released. Now two DSM
slots have the same handle, and dsm_attach() can be confused by the
old segment and give up.

Here's a draft patch to fix that. It also clears the handle in a case
where it wasn't previously cleared, but that wasn't strictly
necessary. It just made debugging less confusing.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Fix-race-in-dsm_attach-when-handles-are-recycled.patch application/x-patch 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2019-02-14 12:21:19 Re: pg_basebackup ignores the existing data directory permissions
Previous Message Kyotaro HORIGUCHI 2019-02-14 12:04:37 Re: [Suspect SPAM] Better error messages when lacking connection slots for autovacuum workers and bgworkers