Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

From: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date: 2024-03-13 10:04:43
Message-ID: CAGECzQRvmUK5-d68A+cm+fgmfht9Dv2uZ28-qq3QiaF6EAZqPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 13 Mar 2024 at 04:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I suspect it's basically just a
> timing dependency. Have you thought about the fact that a cancel
> request is a no-op if it arrives after the query's done?

I agree it's probably a timing issue. The cancel being received after
the query is done seems very unlikely, since the query takes 180
seconds (assuming PG_TEST_TIMEOUT_DEFAULT is not lowered for these
animals). I think it's more likely that the cancel request arrives too
early, and thus being ignored because no query is running yet. The
test already had logic to wait until the query backend was in the
"active" state, before sending a cancel to solve that issue. But my
guess is that that somehow isn't enough.

Sadly I'm having a hard time reliably reproducing this race condition
locally. So it's hard to be sure what is happening here. Attached is a
patch with a wild guess as to what the issue might be (i.e. seeing an
outdated "active" state and thus passing the check even though the
query is not running yet)

Attachment Content-Type Size
v37-0001-Hopefully-make-cancel-test-more-reliable.patch application/octet-stream 4.2 KB
v37-0002-Start-using-new-libpq-cancel-APIs.patch application/octet-stream 10.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-03-13 10:18:33 Re: Vectored I/O in bulk_write.c
Previous Message shveta malik 2024-03-13 09:15:14 Re: Introduce XID age and inactive timeout based replication slot invalidation