Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date: 2021-03-20 04:47:47
Message-ID: CA+hUKGJ8gSaCcu8ky-UBtdAfyHRGwU9zEgsXQH5SuV3iOLaMGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 6, 2021 at 12:10 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > On 3 Mar 2021, at 23:19, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > That's why I release and reacquire that LWLock. But does that break some
> > logic?
>
> One clear change to current behavior is naturally that a concurrent
> TablespaceCreateDbspace can happen while barrier absorption is performed.
> Given where we are that might not be a problem, but I don't have enough
> caffeine at the moment to conclude anything there. Testing nu inducing
> concurent calls while absorption was stalled didn't trigger anything, but I'm
> sure I didn't test every scenario. Do you see anything off the cuff?

Now I may have the opposite problem (too much coffee) but it looks
like it should work about as well as it does today. At this new point
where we released the LWLock, all we've really done is possibly unlink
some empty database directories in destroy_tablespace_directories(),
and that's harmless, they'll be recreated on demand if we abandon
ship. If TablespaceCreateDbspace() happened while we were absorbing
the barrier and not holding the lock in this new code, then a
concurrent mdcreate() is running and so we have a race where we'll
again try to drop all empty directories, and it'll try to create its
relfile in the new empty directory, and one of us will fail (possibly
with an ugly ENOENT error message). But that's already the case in
the master branch: mdcreate() could have run TablespaceCreateDbspace()
before we acquire the lock in the master branch, and (with
pathological enough scheduling) it could reach its attempt to create
its relfile after DropTableSpace() has unlinked the empty directory.

The interlocking here is hard to follow. I wonder why we don't use
heavyweight locks to do per-tablespace interlocking between
DefineRelation() and DropTableSpace(). I'm sure this question is
hopelessly naive and I should probably go and read some history.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-20 04:58:06 Re: Replication slot stats misgivings
Previous Message Masahiro Ikeda 2021-03-20 04:40:45 Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.