Re: XactLockTableWait doesn't set wait_event correctly

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XactLockTableWait doesn't set wait_event correctly
Date: 2016-11-30 11:50:12
Message-ID: CANP8+jJrtCGp6U7GBwcdk7_hGYaX0HoZKVHJUTke-iuNx2XqCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 November 2016 at 19:03, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> how will we distinguish it when some
> process is actually waiting on tuple lock?

The point is that both those actions are waiting for a tuple lock.

Obtaining a tuple lock requires two separate actions: First we do
LockTuple() and then we do XactLockTableWait().

So log_lock_wait output has two separate log entries for the same
thing, which is inconsistent. (One mentions the relation name, the
other mentions the relation id).

(Note that I'm not saying that all callers of XactLockTableWait are
tuple lockers; if they were it would be less confusing).

But at least that info is visible. log_lock_waits output allows you to
see that a XactLockTableWait is actually held for a tuple. There is no
way to do that for pg_stat_activity or pg_locks output, which is
inconsistent.

I'm not worried about abstraction layers, I'd just like to get useful
information out of the server to diagnose locking issues. Right now,
nobody can do that.

My proposal to resolve this is...

1. Make log_lock_wait output the same for both cases... following this format
LOG: process 648 still waiting for ExclusiveLock on tuple (0,1) of
relation 137344 of database 12245 after 1045.220 ms
DETAIL: Process holding the lock: 6460. Wait queue: 648.
STATEMENT: update t1 set c1=4 where c1=1;
Nobody will miss the other format, since the above format has all the
same information.

2. Set wait_event_type = tuple when we wait during XactLockTableWait.
We need the reason info, not the actual wait info, since this is for
users not for our own diagnostics. This isn't very important, since
wait_event_type doesn't include details like which tuple or relation
caused the wait.

3. pg_locks output can't fit both locktag and reason info inside the
LOCKTAG struct, so we'd need to do something like store the reason
info in a separate hash table, so it can be used to explain a
transaction lock entry. I'm sure that will raise an objection, so
we'll need something like a view called pg_lock_wait_reason. Better
suggestions welcome.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2016-11-30 11:51:27 Re: Random number generation, take two
Previous Message Amit Kapila 2016-11-30 10:53:50 Re: Fix checkpoint skip logic on idle systems by tracking LSN progress