Re: Patch: show relation and tuple infos of a lock to acquire

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Christian Kruse <christian(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: show relation and tuple infos of a lock to acquire
Date: 2014-03-09 06:45:55
Message-ID: CAA4eK1Lq94++SMhydfffkPUof4H0ruYt9FNsjgMMvVg0B2hU+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 27, 2014 at 4:14 PM, Christian Kruse
<christian(at)2ndquadrant(dot)com> wrote:
> On 25/02/14 16:11, Robert Haas wrote:
>> Reading this over, I'm not sure I understand why this is a CONTEXT at
>> all and not just a DETAIL for the particular error message that it's
>> supposed to be decorating. Generally CONTEXT should be used for
>> information that will be relevant to all errors in a given code path,
>> and DETAIL for extra information specific to a particular error.
>
> Because there is more than one scenario where this context is useful,
> not just log_lock_wait messages.
>
>> If we're going to stick with CONTEXT, we could rephrase it like this:
>>
>> CONTEXT: while attempting to lock tuple (1,2) in relation with OID 3456
>>
>> or when the relation name is known:
>>
>> CONTEXT: while attempting to lock tuple (1,2) in relation "public"."foo"
>
> Accepted. Patch attached.

Thanks. I have done review of this patch and found couple of things
which I feel should be addressed.

1.
"> ISTM that you should be printing just the value and the unique index
> there, and not any information about the tuple proper.

Good point, I will have a look at this."

This point is still not handled in patch, due to which the message
it displays seems to be incomplete. Message it displays is as below:

LOG: process 1800 still waiting for ShareLock on transaction 679 after 1014.000
ms
CONTEXT: while attempting to lock in relation "public"."idx_t1" of database pos
tgres

Here it is not clear what it attempts *lock in*

2. In function XactLockTableWaitErrorContextCallback(), ignore dropped
columns in tuple, else it will lead to following failure:

Session-1

postgres=# select * from t1;
c1 | c2
----+----
2 | 99
(1 row)

postgres=# Alter table t1 drop column c2;
ALTER TABLE
postgres=# begin;
BEGIN
postgres=# delete from t1 where c1=2;

Session-2
postgres=# begin;
BEGIN
postgres=# delete from t1 where c1=2;
ERROR: cache lookup failed for type 0
CONTEXT: while attempting to lock tuple (0,2) in relation "public"."t1" of data
base postgres

Problem is in Session-2 (cache lookup failed), when it tries to print values
for dropped attribute.

3.
" in relation \"%s\".\"%s\" of database %s",
Why to display only relation name in quotes?
I think database name should also be displayed in quotes.

4.
Context message:
"while attempting to lock tuple (0,2) ".

I think it will be better to rephrase it (may be by using 'operate on'
instead of 'lock').
The reason is that actually we trying to lock on transaction, so it doesn't make
good sense to use "lock on tuple"

5.
+ XactLockTableWaitWithInfo(relation, &tp, xwait);

+ MultiXactIdWaitWithErrorContext(relation, &tp, (MultiXactId) xwait,
+ MultiXactStatusUpdate, NULL, infomask);

I think we should make the name of MultiXactIdWaitWithErrorContext()
similar to XactLockTableWaitWithInfo.

6.
@@ -1981,7 +1981,8 @@ EvalPlanQualFetch(EState *estate, Relation
relation, int lockmode,
if (TransactionIdIsValid(SnapshotDirty.xmax))
{
ReleaseBuffer(buffer);
- XactLockTableWait(SnapshotDirty.xmax);
+ XactLockTableWaitWithInfo(relation, &tuple,
+ SnapshotDirty.xmax);

In function EvalPlanQualFetch(), we are using SnapshotDirty to fetch
the tuple, so I think there is a chance that it will log the tuple which
otherwise will not be visible. I don't think this is right.

7.
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1307,7 +1307,7 @@ retry:
if (TransactionIdIsValid(xwait))
{
index_endscan(index_scan);
- XactLockTableWait(xwait);
+ XactLockTableWaitWithInfo(heap, tup, xwait);

In function check_exclusion_constraint(), DirtySnapshot is used,
so it seems to me that this will also have the problem of logging
of tuple which otherwise will not be visible.

8.
void
WaitForLockers(LOCKTAG heaplocktag, LOCKMODE lockmode)
{
- List *l;
+ List *l;

Extra spacing not needed.

9.
/*
* XactLockTableWaitErrorContextCallback
* error context callback set up by
* XactLockTableWaitWithInfo. It reports some
* tuple information and the relation of a lock to aquire
*
*/
Please improve indentation of above function header

10.
+#include "access/htup.h"
+
+struct XactLockTableWaitLockInfo
+{
+ Relation rel;
+ HeapTuple tuple;
+};

I think it is better to add comments for this structure.
You can refer comments for struct HeapUpdateFailureData

11.
+ *
+ * Use this instead of calling XactTableLockWait()

In above comment, function name used is wrong and I am not sure this
is right statement to use because still we are using XactLockTableWait()
at few places like in function Do_MultiXactIdWait() and recently this is used
in function SnapBuildFindSnapshot().

12.
heap_update()
{
..
..
/*
* There was no UPDATE in the MultiXact; or it aborted. No
* TransactionIdIsInProgress() call needed here, since we called
* MultiXactIdWait() above.

Change the function name in above comment.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martín Marqués 2014-03-09 12:23:33 Re: Regression test errors
Previous Message Bruce Momjian 2014-03-09 01:55:18 Re: ANALYZE sampling is too good