Re: BUG #5798: Some weird error with pl/pgsql procedure

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5798: Some weird error with pl/pgsql procedure
Date: 2010-12-22 21:53:45
Message-ID: AANLkTi==2o9JPJwwfOsL2CnFpcw1YYy_AY_qCiKtg05W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

It happened again in much simpler case (no slony or deadlock message
involved... no slony at all on the database).

The same error
DBD::Pg::db selectall_arrayref failed: ERROR: cannot extract system
attribute from virtual tuple
CONTEXT: PL/pgSQL function "__clear_table_tail" line 50 at FOR over
EXECUTE statement

It happened when function worked with small table with only 33 pages
size and 600 live tuples.
What made situations even more easy to track it is I know this table
don't have any inserts or deletes for ages (only select and update)
and I have full log of database queries during problem.

Log of call of the procedure:
2010-12-23 00:21:04 MSK pgsql(at)billing 55717 ERROR: cannot extract
system attribute from virtual tuple
2010-12-23 00:21:04 MSK pgsql(at)billing 55717 CONTEXT: PL/pgSQL
function "__clear_table_tail" line 50 at FOR over EXECUTE statement
2010-12-23 00:21:04 MSK pgsql(at)billing 55717 STATEMENT:
SELECT * from __clear_table_tail('ctrl_servers', 'bill', 'opt', 21, 1)

In the same time log all actions involving changes on table
ctrl_servers during +/- 3 second from ERROR:
2010-12-23 00:21:03 MSK bill(at)billing 42078 LOG: duration: 0.736 ms
statement: UPDATE ctrl_servers SET last_connect = 'now' WHERE ( id =
'1514' )
Just one query.

Line where ERROR happens:
FOR _new_tid IN EXECUTE 'UPDATE ONLY ' ||
quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || '
SET ' || arg_field || ' = ' || arg_field || ' WHERE ctid=ANY($1)
RETURNING ctid' USING _current_tid_list LOOP

Need be said _current_tid_list at start of procedure call contained
all possible tids for 21 page of relation (not only for live tuples
but all possible (in range like {21,1} to {21,300+})).

I have an idea the error happens because some rare race conditions
with tid scan and concurrent updates in the same page (probably HOT
somehow involved... because both updateable fields in function and
concurrent update query updated non-indexed fields).

Currently I work on creating self contained test case (unfortunately
without much success until now).

About stack backtrace I not sure how to get it (and even worse the
database is heavily loaded production DB so I not sure creating core
dump would be good idea).
Still I would like to receive any help with creating a stack backtrace.

On Wed, Dec 22, 2010 at 5:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 20, 2010 at 9:48 PM, Maxim Boguk <Maxim(dot)Boguk(at)gmail(dot)com> wrote:
>> Anyone can enlighten me what happens here?
>
> That does look weird, but without a simple test case I think it's
> going to be hard to investigate this.  Installing Slony and your code
> and building a 130GB is more effort than I'm willing to put in...
>
> Any way you can get a stack backtrace at the point the error occurs?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
МойКруг: http://mboguk.moikrug.ru/

Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2010-12-23 01:45:41 Re: BUG #5781: unaccent() function should be marked IMMUTABLE
Previous Message Fujii Masao 2010-12-22 07:58:29 Re: memory leaks? using savepoint