Re: regression, deadlock in high frequency single-row UPDATE

From: Andrew Sackville-West <awest(at)janrain(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: regression, deadlock in high frequency single-row UPDATE
Date: 2014-08-04 16:12:05
Message-ID: 20140804161205.GA30380@andrew-ThinkPad-X230
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jul 31, 2014 at 08:59:50PM -0400, Alvaro Herrera wrote:
> Andrew Sackville-West wrote:
> > Hi,
> >
> > I've discussed this problem on irc a couple of times and think I've
> > found a regression that plagues our application, introduced in some
> > version newer than 9.1.9, and still present in 9.3.5.
>
> I don't think you have provided everything: for instance I see that your
> log has INSERTs into table access_tokens, which has a foreign key
> relationship to/from some other table you showed.

Ah you're correct, I'll attach the description of that table. But,
really, that is all that's going on in the database.

>
> I'm too lazy to unscramble the whole thing into a real schema right now.
> pg_dump output would be more helpful, as would some trivial sample data
> enough to reproduce the problem.

It's really minimal data. I'll attach a dump of the access_grants and
oauth_clients table. The access_tokens table is not empty, but I'm not
sure how relevant the data is. What is shown in the log is
typical. There is another table in play here, refresh_tokens, but
we're not actually using them in this case -- the table is empty, and
the reference to it in access_tokens in null.

> It'd probably be easier to reproduce if you wrap the statements in
> transactions and have them sleep before COMMIT.

The issue here is, with no code changes, this works fine under very
heavy load in production against 9.0.x, 9.1.9, etc. But under
9.3.{3,4,5} it fails. That code has no transaction. I think adding a
transaction is a distraction from what appears to me to be a
regression.

Interestingly, if I change the code to include a transaction around
the UPDATE, the problem goes away (or so it seems so far in my testing).

> (If you run two of
> these updates in parallel, the first one blocks the second until the
> first one commits, because they update the same row. What happens if
> the blocking session runs the aforementioned INSERT while the other is
> blocked? Perhaps three sessions are needed to show a problem without
> explicit transactions.)

I've run pg_bench from 3 machines all pointing at the same database
running this same set of statements and it fails to induce the
deadlock.

It occurs to me that I haven't completely described the sequence of
actions, so here is a snippet of the actual SQL that a given action
induces, when it works as expected:

SET standard_conforming_strings TO on;

SET datestyle TO ISO;

SET client_encoding TO utf8;

SELECT "user_1"."id", "user_1"."uuid", "user_1"."lastUpdated" AT TIME
ZONE 'UTC', "user_1"."id" as _id
FROM "z8z6px927zu6qzzbnb5ntgghxg"."user_1" AS "user_1" WHERE "user_1"."id" = 129;

UPDATE "z8z6px927zu6qzzbnb5ntgghxg"."access_grants" ag
SET last_issued=DEFAULT
FROM "z8z6px927zu6qzzbnb5ntgghxg"."oauth_clients" oc
WHERE oc.id = ag.client_id AND ag.entity_name = 'user'
AND ag.entity_id = 129 AND oc.client_id = '3hp45h9d4f9wwtx7cvpus6rdb4s5kb9f'
RETURNING ag.id;

INSERT INTO "z8z6px927zu6qzzbnb5ntgghxg"."access_tokens"
(token, access_id, refresh_token_id, issued, expiry)
VALUES
('xys7vzs94huex2y7', 2, null, timezone('UTC'::text, now()), timezone('UTC'::text, now()) + '1:00:00':: interval);

This represents one complete action. There are several of these
running concurrently, interleaved together. There is no transaction
involved here. As far as I can tell, the application is using
bog-standard libpq (via FFI wrapper in Haskell) with no non-default
configuration applied to the connection.

I realize there's not a lot to go on here, and I don't expect help
debugging my application. I've got a patch that seems to resolve the
problem for us. But I'm suspicious of this change in behavior and hope
that pointing it out can help in some way.

regards,

A

>
> --
> Álvaro Herrera http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--

Attachment Content-Type Size
access_tokens_table text/plain 1.3 KB
sample_data text/plain 704 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2014-08-04 16:30:27 Re: regression, deadlock in high frequency single-row UPDATE
Previous Message Bruce Momjian 2014-08-04 15:57:01 Re: BUG #11090: Unclear error message in pg_upgrade