BUG #5459: Unable to cancel query while in send()

From: "Mason Hale" <mason(at)onespot(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5459: Unable to cancel query while in send()
Date: 2010-05-11 21:29:06
Message-ID: 201005112129.o4BLT6nm051435@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5459
Logged by: Mason Hale
Email address: mason(at)onespot(dot)com
PostgreSQL version: 8.3.8
Operating system: Redhat EL 5.1-64 bit
Description: Unable to cancel query while in send()
Details:

ISSUE: unable to cancel queries using pg_cancel_backend(), that are in
send() function call, waiting on client receipt of data.

EXPECTED RESULT: expect to be able to cancel most/all queries using
pg_cancel_backend() as superuser, perhaps with some wait time, but not an
hour or more.

= SYMPTOM =

A SELECT query was running over 18 hours on our PostgreSQL 8.3.8 server.
Verified that it was not waiting on any locks via pg_stat_activity.
Attempted to cancel the query using pg_cancel_backend(), which returned 't'.
However more than an hour later the process was still active, using about 6%
of CPU and 5% of RAM.

Terminated the client process that was running the query (from another
server) did not cause the query process on the pgsql server to stop. In this
case the client was connecting via a ssh tunnel through an intermediate
'gateway' server.

Connection path was:

CLIENT --> SSH GATEWAY --> DB SERVER

= DIAGNOSIS =

Diagnosed this issue with help from 'andres' in #postgresql IRC. Per his
request, attached to 'stuck' process using gdb, generating the following
outputs:

- Initial backtrace: http://pgsql.privatepaste.com/6f15c7e363
-( 'c', then ctrl+c, then 'bt full') x 4:
http://pgsql.privatepaste.com/3d3261659a
- Stepping several times with 'n':
http://pgsql.privatepaste.com/0f302125a8

'andres' reported that interrupts were not checked in send() and probably
should be, and suggested opening this bug report.

Additional investigation of the ssh tunnel connection revealed the
connection on the intermediate gateway server was stuck in a FIN_WAIT2 state
(as reported by netstat). The other end of the connection on the pgsql
server was reported as CLOSE_WAIT by netstat.

Kiling the ssh tunnel process on the gateway server cleared the connection
and the long-running query process db server terminated very soon after.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2010-05-11 22:33:41 Re: bool: symbol name collision
Previous Message Bryan Henderson 2010-05-11 21:12:27 Re: bool: symbol name collision