Re: lock on object is already held

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Wood <dwood(at)salesforce(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lock on object is already held
Date: 2013-12-01 04:50:42
Message-ID: CAFj8pRAdN9UgWTtwp9ixH09+iD4pY_LB1Wc-haOA-e=G4pzn2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

we found this issue year ago -
http://www.postgresql.org/message-id/CAFj8pRAHVzUPfBx+8EY-XHfwBo8bxVu_YNMBAPSDj8W-ARatLA@mail.gmail.com

I try to simulate this error, but without success - so I prepared patch
that had to help with identification of this issue. Important part is
backport process startup from 9.2. After applying we detected this issue
newer.

Regards

Pavel

2013/11/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Daniel Wood <dwood(at)salesforce(dot)com> writes:
> > ... Presuming your fix is putting PG_SETMASK(&UnBlockSig)
> > immediately before each of the 6 calls to ereport(ERROR,...) I've been
> > running the stress test with both this fix and the lock already held fix.
>
> I'm now planning to put it in error cleanup instead, but that's good
> enough for proving that the problem is what I thought it was.
>
> > I get plenty of lock timeout errors as expected. However, once in a
> great
> > while I get: sqlcode = -400, sqlstate = 57014, sqlerrmc = canceling
> > statement due to user request
> > My stress test certainly doesn't do a user cancel. Should this be
> expected?
>
> I think I see what must be happening there: the lock timeout interrupt is
> happening at some point after the lock has been granted, but before
> ProcSleep reaches its disable_timeouts call. QueryCancelPending gets set,
> and will be honored next time something does CHECK_FOR_INTERRUPTS.
> But because ProcSleep told disable_timeouts to clear the LOCK_TIMEOUT
> indicator bit, ProcessInterrupts thinks the cancel must've been a plain
> user SIGINT, and reports it that way.
>
> What we should probably do about this is change ProcSleep to not clear the
> LOCK_TIMEOUT indicator bit, same as we already do in LockErrorCleanup,
> which is the less-race-condition-y path out of a lock timeout.
>
> (It would be cooler if the timeout handler had a way to realize that the
> lock is already granted, and not issue a query cancel in the first place.
> But having a signal handler poking at shared memory state is a little too
> scary for my taste.)
>
> It strikes me that this also means that places where we throw away pending
> cancels by clearing QueryCancelPending, such as the sigsetjmp stanza in
> postgres.c, had better reset the LOCK_TIMEOUT indicator bit. Otherwise,
> a thrown-away lock timeout cancel might cause a later SIGINT cancel to be
> misreported.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Attachment Content-Type Size
orphaned_locks.patch text/x-patch 4.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2013-12-01 05:46:55 Re: palloc0
Previous Message Jeff Davis 2013-12-01 03:13:50 Re: Extension Templates S03E11