Re: psql not responding to SIGINT upon db reconnection

From: "Tristan Partin" <tristan(at)neon(dot)tech>
To: "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>, "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: psql not responding to SIGINT upon db reconnection
Date: 2023-11-29 17:48:51
Message-ID: CXBHH2O58HVA.JUP07AES8JVK@neon.tech
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu Nov 23, 2023 at 3:19 AM CST, Heikki Linnakangas wrote:
> On 22/11/2023 23:29, Tristan Partin wrote:
> > Ha, you're right. I had this working yesterday, but convinced myself it
> > didn't. I had a do while loop wrapping the blocking call. Here is a v4,
> > which seems to pass the tests that were pointed out to be failing
> > earlier.
>
> Thanks! This suffers from a classic race condition:
>
> > + if (cancel_pressed)
> > + break;
> > +
> > + sock = PQsocket(n_conn);
> > + if (sock == -1)
> > + break;
> > +
> > + rc = pqSocketPoll(sock, for_read, !for_read, (time_t)-1);
> > + Assert(rc != 0); /* Timeout is impossible. */
> > + if (rc == -1)
> > + {
> > + success = false;
> > + break;
> > + }
>
> If a signal arrives between the "if (cancel_pressed)" check and
> pqSocketPoll(), pgSocketPoll will "miss" the signal and block
> indefinitely. There are three solutions to this:
>
> 1. Use a timeout, so that you wake up periodically to check for any
> missed signals. Easy solution, the downsides are that you will not react
> as quickly if the signal is missed, and you waste some cycles by waking
> up unnecessarily.
>
> 2. The Self-pipe trick: https://cr.yp.to/docs/selfpipe.html. We also use
> that in src/backend/storage/ipc/latch.c. It's portable, but somewhat
> complicated.
>
> 3. Use pselect() or ppoll(). They were created specifically to address
> this problem. Not fully portable, I think it's missing on Windows at least.
>
> Maybe use pselect() if it's available, and select() with a timeout if
> it's not.

First I have ever heard of this race condition, and now I will commit it
to durable storage :). I went ahead and did option #3 that you proposed.
On Windows, I have a 1 second timeout for the select/poll. That seemed
like a good balance of user experience and spinning the CPU. But I am
open to other values. I don't have a Windows VM, but maybe I should set
one up...

I am not completely in love with the code I have written. Lots of
conditional compilation which makes it hard to read. Looking forward to
another round of review to see what y'all think.

For what it's worth, I did try #2, but since psql installs its own
SIGINT handler, the code became really hairy.

--
Tristan Partin
Neon (https://neon.tech)

Attachment Content-Type Size
v5-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patch text/x-patch 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-11-29 17:52:01 Re: Fix assertion in autovacuum worker
Previous Message Andres Freund 2023-11-29 17:42:35 Re: remaining sql/json patches