Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date: 2021-03-01 10:06:40
Message-ID: 3B717AA0-744C-4D87-8C5B-B9CA62B5D3F5@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 1 Mar 2021, at 05:46, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

>> The 0003 patch to achieve $SUBJECT needs
>> more discussion.
>
> Rebased.
>
> The more I think about it, the more I think that this approach is good
> enough for an initial solution to the problem. It only affects
> Windows, dropping tablespaces is hopefully rare, and it's currently
> broken on that OS. That said, it's complex enough, and I guess more
> to the point, enough of a compromise, that I'm hoping to get some
> explicit consensus about that.
>
> A better solution would probably have to be based on the sinval queue,
> somehow. Perhaps with a new theory or rule making it safe to process
> at every CFI(), or by deciding that we're prepared have the operation
> wait arbitrarily long until backends reach a point where it is known
> to be safe (probably near ProcessClientReadInterrupt()'s call to
> ProcessCatchupInterrupt()), or by inventing a new kind of lightweight
> "sinval peek" that is safe to run at every CFI() point, being based on
> reading (but not consuming!) the sinval queue and performing just
> vfd-close of referenced smgr relations in this case. The more I think
> about all that complexity for a super rare event on only one OS, the
> more I want to just do it the stupid way and close all the fds.
> Robert opined similarly in an off-list chat about this problem.

I don't know Windows at all so I can't really comment on that portion, but from
my understanding of procsignalbarriers I think this seems right. No tests
break when forcing the codepath to run on Linux and macOS.

Should this be performed in tblspc_redo as well for the similar case?

+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)

Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than
just on Windows? That could warrant a quick word in the comment if so IMO to
avoid confusion.

-ProcessBarrierPlaceholder(void)
+ProcessBarrierSmgrRelease(void)
{
- /*
- * XXX. This is just a placeholder until the first real user of this
- * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
- * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
- * appropriately descriptive. Get rid of this function and instead have
- * ProcessBarrierSomethingElse. Most likely, that function should live in
- * the file pertaining to that subsystem, rather than here.
- *
- * The return value should be 'true' if the barrier was successfully
- * absorbed and 'false' if not. Note that returning 'false' can lead to
- * very frequent retries, so try hard to make that an uncommon case.
- */
+ smgrrelease();

Should this instead be in smgr.c to avoid setting a precedent for procsignal.c
to be littered with absorption functions?

--
Daniel Gustafsson https://vmware.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-01 10:10:43 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Julien Rouhaud 2021-03-01 09:33:24 Re: archive_command / pg_stat_archiver & documentation