Re: pg11.1: dsa_area could not attach to segment

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg11.1: dsa_area could not attach to segment
Date: 2019-02-14 21:00:12
Message-ID: CAEepm=39xAYNASZA3Ji-QX6qJz8o7OqUL9t1_sZ-FTfoKrRZZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 15, 2019 at 5:36 AM Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> > Do you think that plausibly explains and resolves symptoms of bug#15585, too?
>
> I think yes. Bug#15585 raised only after "dsa_area could not attach to segment" in different parallel worker. Leader stuck because waiting all parallel workers, but one worker has unexpected recursion in dsm_backend_shutdown [1] and will never shutdown. Backtrace show previous error in this backend: "cannot unpin a segment that is not pinned" - root cause is earlier and in a different process.

Agreed. Even though it's an unpleasant failure mode, I'm not entirely
sure if it's a good idea to make changes to avoid it. We could move
the code around so that the error is raised after releasing the lock,
but then you'd either blow the stack or loop forever due to longjmp (I
haven't checked which). To avoid that you'd have to clean out the
book-keeping in shared memory eagerly so that at the next level of
error recursion you've at least made progress (and admittedly there
are examples of things like that in the code), but how far should we
go to tolerate cases that shouldn't happen? Practically, if we had
that behaviour and this bug, you'd eventually eat all the DSM slots
with leaked segments of shared memory, and your system wouldn't work
too well. For now I think it's better to treat the root cause of the
unexpected error.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2019-02-14 21:01:55 Re: libpq debug log
Previous Message Andres Freund 2019-02-14 20:37:52 2019-03 CF Summary / Review - Tranche #1