Quick Links

Re: lock on object is already held

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Daniel Wood <dwood(at)salesforce(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: lock on object is already held
Date:	2013-12-01 04:50:42
Message-ID:	CAFj8pRAdN9UgWTtwp9ixH09+iD4pY_LB1Wc-haOA-e=G4pzn2Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

we found this issue year ago -
http://www.postgresql.org/message-id/CAFj8pRAHVzUPfBx+8EY-XHfwBo8bxVu_YNMBAPSDj8W-ARatLA@mail.gmail.com

I try to simulate this error, but without success - so I prepared patch
that had to help with identification of this issue. Important part is
backport process startup from 9.2. After applying we detected this issue
newer.

Regards

Pavel

2013/11/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Daniel Wood <dwood(at)salesforce(dot)com> writes:
> > ... Presuming your fix is putting PG_SETMASK(&UnBlockSig)
> > immediately before each of the 6 calls to ereport(ERROR,...) I've been
> > running the stress test with both this fix and the lock already held fix.
>
> I'm now planning to put it in error cleanup instead, but that's good
> enough for proving that the problem is what I thought it was.
>
> > I get plenty of lock timeout errors as expected. However, once in a
> great
> > while I get: sqlcode = -400, sqlstate = 57014, sqlerrmc = canceling
> > statement due to user request
> > My stress test certainly doesn't do a user cancel. Should this be
> expected?
>
> I think I see what must be happening there: the lock timeout interrupt is
> happening at some point after the lock has been granted, but before
> ProcSleep reaches its disable_timeouts call. QueryCancelPending gets set,
> and will be honored next time something does CHECK_FOR_INTERRUPTS.
> But because ProcSleep told disable_timeouts to clear the LOCK_TIMEOUT
> indicator bit, ProcessInterrupts thinks the cancel must've been a plain
> user SIGINT, and reports it that way.
>
> What we should probably do about this is change ProcSleep to not clear the
> LOCK_TIMEOUT indicator bit, same as we already do in LockErrorCleanup,
> which is the less-race-condition-y path out of a lock timeout.
>
> (It would be cooler if the timeout handler had a way to realize that the
> lock is already granted, and not issue a query cancel in the first place.
> But having a signal handler poking at shared memory state is a little too
> scary for my taste.)
>
> It strikes me that this also means that places where we throw away pending
> cancels by clearing QueryCancelPending, such as the sigsetjmp stanza in
> postgres.c, had better reset the LOCK_TIMEOUT indicator bit. Otherwise,
> a thrown-away lock timeout cancel might cause a later SIGINT cancel to be
> misreported.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Attachment	Content-Type	Size
orphaned_locks.patch	text/x-patch	4.7 KB

In response to

Re: lock on object is already held at 2013-11-29 20:13:51 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2013-12-01 05:46:55	Re: palloc0
Previous Message	Jeff Davis	2013-12-01 03:13:50	Re: Extension Templates S03E11