Re: dropdb --force

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ryan Lambert <ryan(at)rustprooflabs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Filip Rembiałkowski <filip(dot)rembialkowski(at)gmail(dot)com>
Subject: Re: dropdb --force
Date: 2019-10-24 09:09:52
Message-ID: CAA4eK1K+PePydjuDZO1bDtB7tARAf72a4soJxTz=PoH6LgeZ5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 23, 2019 at 12:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 22, 2019 at 4:51 PM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> >
> > út 22. 10. 2019 v 5:09 odesílatel Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> napsal:
> >>
> >>
> >> CountOtherDBBackends is called from other places as well, so I don't
> >> think it is advisable to change the sleep time in that function.
> >> Also, I don't want to add a parameter for it. I think you have a
> >> point that in some cases we might end up sleeping for 100ms when we
> >> could do with less sleeping time, but I think it is true to some
> >> extent today as well. I think we can anyway change it in the future
> >> if there is a problem with the sleep timing, but for now, I think we
> >> can just call CountOtherDBBackends after sending SIGTERM and call it
> >> good. You might want to add a futuristic note in the code.
> >>
> >
> > ok.
> >
> > I removed sleeping from TerminateOtherDBBackends().
> >
> > If you want to change any logic there, please, do it without any hesitations. Maybe I don't see, what you think.
> >
>
> Fair enough, I will see if I need to change anything.
>

While making some changes in the patch, I noticed that in
TerminateOtherDBBackends, there is a race condition where after we
release the ProcArrayLock, the backend process to which we decided to
send a signal exits by itself and the same pid can be assigned to
another backend which is connected to some other database. This leads
to terminating a wrong backend. I think we have some similar race
condition at few other places like in pg_terminate_backend(),
ProcSleep() and CountOtherDBBackends(). I think here the risk is a
bit more because there could be a long list of pids.

One idea could be that we write a new function similar to IsBackendPid
which takes dbid and ensures that pid belongs to that database and use
that before sending kill signal, but still it will not be completely
safe. But, I think it can be closer to cases like we already have in
code.

Another possible idea could be to use the SendProcSignal mechanism
similar to how we have used it in CancelDBBackends() to allow the
required backends to exit by themselves. This might be safer.

I am not sure if we can entirely eliminate this race condition and
whether it is a good idea to accept such a race condition even though
it exists in other parts of code. What do you think?

BTW, I have added/revised some comments in the code and done few other
cosmetic changes, the result of which is attached.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
drop-database-force-20191024.amit.patch application/octet-stream 15.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-10-24 09:13:48 Re: Ordering of header file inclusion
Previous Message Dilip Kumar 2019-10-24 08:27:45 Re: Fix of fake unlogged LSN initialization