Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2024-03-13 19:00:52
Message-ID: 202403131900.6ih7a4dzqefu@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-Mar-13, Jelte Fennema-Nio wrote:

> I agree it's probably a timing issue. The cancel being received after
> the query is done seems very unlikely, since the query takes 180
> seconds (assuming PG_TEST_TIMEOUT_DEFAULT is not lowered for these
> animals). I think it's more likely that the cancel request arrives too
> early, and thus being ignored because no query is running yet. The
> test already had logic to wait until the query backend was in the
> "active" state, before sending a cancel to solve that issue. But my
> guess is that that somehow isn't enough.
>
> Sadly I'm having a hard time reliably reproducing this race condition
> locally. So it's hard to be sure what is happening here. Attached is a
> patch with a wild guess as to what the issue might be (i.e. seeing an
> outdated "active" state and thus passing the check even though the
> query is not running yet)

I tried leaving the original running in my laptop to see if I could
reproduce it, but got no hits ... and we didn't get any other failures
apart from the three ones already reported ... so it's not terribly high
probability. Anyway I pushed your patch now since the theory seems
plausible; let's see if we still get the issue to reproduce. If it
does, we could make the script more verbose to hunt for further clues.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Here's a general engineering tip: if the non-fun part is too complex for you
to figure out, that might indicate the fun part is too ambitious." (John Naylor)
https://postgr.es/m/CAFBsxsG4OWHBbSDM%3DsSeXrQGOtkPiOEOuME4yD7Ce41NtaAD9g%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-03-13 19:01:54 Re: clarify equalTupleDescs()
Previous Message Nathan Bossart 2024-03-13 18:52:45 Re: Reports on obsolete Postgres versions