Re: recent deadlock regression test failures

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recent deadlock regression test failures
Date: 2017-04-08 13:11:08
Message-ID: CAEepm=0ew3_soZf9DBp9DY19mP=CeMo38CfCbfLKX2LrHVR7_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 8, 2017 at 4:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> Here is an attempt at option 2 from the menu I posted above. Questions:
>
>> 1. Does anyone object to this extension of pg_blocking_pids()'s
>> remit? If so, I could make it a separate function (that was option
>> 3).
>
> It seems an entirely principle-free change in the function's definition.

Well... other backends can block a SERIALIZABLE DEFERRABLE
transaction, so it doesn't seem that unreasonable to expect that a
function named pg_blocking_pids(blocked_pid) described as "get array
of PIDs of sessions blocking specified backend PID" should be able to
tell you who they are.

You might say that pg_blocking_pid() is about locking only and not
arbitrary other kinds of waits, but safe snapshots are not completely
unrelated to locking if you tilt your head at the right angle:
GetSafeSnapshot waits for all transactions that might hold SIRead
locks that could affect this transaction's serializability to
complete.

But I can see that it's an odd case. Minimal separate function
version attached.

> I'm not actually clear on why Kevin wanted this change in
> isolationtester's wait behavior anyway, so maybe some clarification
> on that would be a good idea.

I can't speak for Kevin but here's my argument as patch author: One
of the purposes of the isolation tester is to test our transaction
isolation. SERIALIZABLE DEFERRABLE is a special case of one of our
levels and should be tested. Statement s3r in the new spec
read-only-anomaly-3.spec runs at that level and causes the backend to
wait for another backend. Without any change to isolationtester it
would hang on that statement until timeout failure. Therefore I
proposed that isolationtester should recognise this kind of waiting
and proceed to later steps that can unblock it, like so:

step s3r: SELECT id, balance FROM bank_account WHERE id IN ('X', 'Y')
ORDER BY id; <waiting ...>
step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X';
step s2c: COMMIT;
step s3r: <... completed>

> But if we need it, I think probably
> a dedicated function would be a good thing. We want the wait-checking
> query to be as trivial as possible at the SQL level, so whatever
> semantic oddities it needs to have should be pushed into C code.

Based on the above, here is a version that introduces a simple boolean
function pg_waiting_for_safe_snapshot(pid) and adds that the the
query. This was my "option 1" upthread.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
boolean-function.patch application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2017-04-08 13:46:33 Re: plpgsql - additional extra checks
Previous Message Bruce Momjian 2017-04-08 10:39:16 Re: [pgsql-www] Small issue in online devel documentation build