From: | marko(at)joh(dot)to |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #9557: Row not visible after receiving notification |
Date: | 2014-03-13 09:39:22 |
Message-ID: | 20140313093922.335.80556@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 9557
Logged by: Marko Tiikkaja
Email address: marko(at)joh(dot)to
PostgreSQL version: 9.1.12
Operating system: Linux
Description:
Hi,
We went live with an application a few days ago, and it's been having
problems with seeing rows after receiving a notification from the
transaction that committed it. I can't say for sure how often it happens,
but I've seen evidence suggesting at least three separate occasions.
The general design is that we have two independent "RPC Workers" LISTENing
for notifications. Upon receiving a notification, they call a function
Claim_Request() which does this:
BEGIN
SELECT
Requests.RequestID INTO _RequestID
FROM rpcworker.Requests
WHERE
state = 'UNCLAIMED' AND
-- sanity check
Requests.Claimed IS NULL
ORDER BY Requests.RequestID
LIMIT 1
FOR UPDATE ;
IF NOT FOUND THEN
RETURN;
END IF;
-- otherwise UPDATE to set claimed=now()
I suspected it might be something wonky with FOR UPDATE, so I added an extra
query sent right before Claim_Request():
SELECT Requests.RequestID
FROM rpcworker.Requests
WHERE state = 'UNCLAIMED' AND
Requests.Claimed IS NULL
ORDER BY Requests.RequestID
LIMIT 1
Below I have attached the relevant portions of the log when this happened.
In the example, the SELECT without the FOR UPDATE did not see the row, but
the subsequent Claim_Request() did (supposedly due to the small latency
added by the extra round-trip by the SELECT before it).
RPCWorker B appeared to be working "normally" in this case (though it was a
bit slower to react than A), but I've attached its logs for posterity as
well. In previous cases, where both of them called Claim_Request()
directly, we saw a couple of instances where *neither* of the RPC workers
saw the new row, which suggests something's terribly wrong.
## RPCWORKER B (UTC): ##
received notification with payload 8839
2014-03-12 21:11:27.250 +0000 LOG: received notification on channel "RPC
Worker"
2014-03-12 21:11:27.253 +0000 LOG: saw requestid 8839 without FOR UPDATE
## RPCWORKER A (UTC): ##
received notification with payload 8839
2014-03-12 21:11:27.240 +0000 LOG: received notification on channel "RPC
Worker"
2014-03-12 21:11:27.245 +0000 LOG: plain SELECT: sql: no rows in result
set
2014-03-12 21:11:27.251 +0000 WARNING: SNAKE: saw 0, claimed 8839
2014-03-12 21:11:27.251 +0000 LOG: received requestid 8839
2014-03-12 21:11:27.253 +0000 LOG: plain SELECT: sql: no rows in result
set
2014-03-12 21:11:27.531 +0000 LOG: request 8839 (attempt 1) succeeded
2014-03-12 21:11:27.562 +0000 LOG: finished request 8839 (attempt 1)
## DBWORKER (UTC + 1): ##
2014-03-12 22:11:27.237
CET,"dbworker","db",34682,"10.1.1.71:42307",531f39d8.877a,211904,"SELECT",2014-03-11
17:29:12 CET,583/727697,1268656677,WARNING,01000,"
# the above line is the last piece of information we see from the
transaction sending the notification
# committed somewhere around here
2014-03-12 22:11:27.246
CET,"dbworker","db",34682,"10.1.1.71:42307",531f39d8.877a,211905,"SELECT",2014-03-11
17:29:12 CET,583/0,0,LOG,00000,"duration: 42.826
ms",,,,,,,,"exec_simple_query, postgres.c:1090",""
# known committed at this point (but a query has already been executed!)
## RPCWORKER A IN DB (UTC + 1): ##
2014-03-12 22:11:21.052
CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12480,"SELECT",2014-03-12
16:28:07 CET,34/0,0,LOG,00000,"duration: 0.741
ms",,,,,,,,"exec_simple_query, postgres.c:1090","rpcworker
# idle for a good while here
2014-03-12 22:11:27.245
CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12481,"idle",2014-03-12
16:28:07 CET,34/2000571,0,LOG,00000,"statement: -- NO FOR UPDATE
2014-03-12 22:11:27.245
CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12482,"SELECT",2014-03-12
16:28:07 CET,34/0,0,LOG,00000,"duration: 0.983
ms",,,,,,,,"exec_simple_query, postgres.c:1090","rpcworker"
# didn't see the row
2014-03-12 22:11:27.252
CET,"rpcworker","db",33568,"10.1.1.72:48556",53207d07.8320,12483,"idle",2014-03-12
16:28:07 CET,34/2000572,0,LOG,00000,"statement: -- Claim_Request()
# sees the row!
## RPCWORKER B IN DB (UTC + 1): ##
2014-03-12 22:11:27.253
CET,"rpcworker","db",32865,"10.2.1.72:33236",53207cee.8061,14392,"idle",2014-03-12
16:27:42 CET,95/1580775,0,LOG,00000,"statement: -- NO FOR UPDATE
# is a bit sluggish, so sees the row!
2014-03-12 22:11:27.254
CET,"rpcworker","db",32865,"10.2.1.72:33236",53207cee.8061,14394,"idle",2014-03-12
16:27:42 CET,95/1580776,0,LOG,00000,"statement: -- Claim_Request()
# would likely see the row, but already claimed by worker A
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2014-03-13 10:52:28 | Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown |
Previous Message | Kyotaro HORIGUCHI | 2014-03-13 08:14:28 | Re: BUG #9553: why bitmap index scan startup_cost=0? it's a bug? |