Re: SSI bug?

From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: drkp(at)csail(dot)mit(dot)edu, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-03-28 03:16:16
Message-ID: 20110328031616.CDAA719CEDF@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi,

>>> (6) Does the application continue to run relatively sanely, or
>>> does it fall over at this point?
>>
>> my application just exits on the error.
>>
>> if i re-run the application without rebooting postgres, it seems
>> that i will get the error sooner than the first run. (but it might
>> be just a matter of luck)
>
> If your application hits this again, could you check pg_stat_activity
> and pg_locks and see if any SIReadLock entries are lingering after
> the owning transaction and all overlapping transactions are
> completed? If anything is lingering between runs of your
> application, it *should* show up in one or the other of these.

this is 71ac48fd9cebd3d2a873635a04df64096c981f73 with your two patches.
this psql session was the only activity to the server at this point.

hoge=# select * from pg_stat_activity;
-[ RECORD 1 ]----+--------------------------------
datid | 16384
datname | hoge
procpid | 7336
usesysid | 10
usename | takashi
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2011-03-26 12:28:21.882226+09
xact_start | 2011-03-28 11:55:19.300027+09
query_start | 2011-03-28 11:55:19.300027+09
waiting | f
current_query | select * from pg_stat_activity;

hoge=# select count(*) from pg_locks where mode='SIReadLock';
-[ RECORD 1 ]
count | 7057

hoge=# select locktype,count(*) from pg_locks group by locktype; -[ RECORD 1 ]--------
locktype | virtualxid
count | 1
-[ RECORD 2 ]--------
locktype | relation
count | 1
-[ RECORD 3 ]--------
locktype | tuple
count | 7061

hoge=#

>
>>> (7) The message hint would help pin it down, or a stack trace at
>>> the point of the error would help more. Is it possible to get
>>> either? Looking over the code, it appears that the only places
>>> that SSI could generate that error, it would cancel that
>>> transaction with the hint "You might need to increase
>>> max_pred_locks_per_transaction." and otherwise allow normal
>>> processing.
>>
>> no message hints. these errors are not generated by SSI code,
>> at least directly.
>
> Right, that's because we were using HASH_ENTER instead of
> HASH_ENTER_NULL. I've posted a patch which should correct that.

sure, with your patch it seems that they turned into different ones.

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_MESSAGE_HINT: You might need to increase max_pred_locks_per_transaction.
PG_DIAG_SOURCE_FILE: predicate.c
PG_DIAG_SOURCE_LINE: 2049
PG_DIAG_SOURCE_FUNCTION: CreatePredicateLock

>>> Even with the above information it may be far from clear where
>>> allocations are going past their maximum, since one HTAB could
>>> grab more than its share and starve another which is staying below
>>> its "maximum". I'll take a look at the possibility of adding a
>>> warning or some such when an HTAB expands past its maximum size.
>
> I see from your later post that you are running with this patch. Has
> that reported anything yet?

i got nothing except the following one. (in the server log)

WARNING: hash table "ShmemIndex" has more entries than expected
DETAIL: The maximum was set to 32 on creation.

YAMAMOTO Takashi

>
> Thanks,
>
> -Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-28 04:01:40 Re: Comments on SQL/Med objects
Previous Message Tom Lane 2011-03-28 02:43:55 Re: Lock problem with autovacuum truncating heap