Re: [SQL] PostgreSQL server terminated by signal 11

From: "Daniel Caune" <daniel(dot)caune(at)ubisoft(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-admin(at)postgresql(dot)org>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [SQL] PostgreSQL server terminated by signal 11
Date: 2006-07-28 16:09:48
Message-ID: 1E293D3FF63A3740B10AD5AAD88535D202B65395@UBIMAIL1.ubisoft.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-sql

> De : Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Envoyé : vendredi, juillet 28, 2006 09:38
> À : Daniel Caune
> Cc : pgsql-admin(at)postgresql(dot)org; pgsql-sql(at)postgresql(dot)org
> Objet : Re: [SQL] PostgreSQL server terminated by signal 11
>
> "Daniel Caune" <daniel(dot)caune(at)ubisoft(dot)com> writes:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x08079e2a in slot_attisnull ()
> > (gdb) bt
> > #0 0x08079e2a in slot_attisnull ()
> > #1 0x0807a1d0 in slot_getattr ()
> > #2 0x080c6c73 in FormIndexDatum ()
> > #3 0x080c6ef1 in IndexBuildHeapScan ()
> > #4 0x0809b44d in btbuild ()
> > #5 0x0825dfdd in OidFunctionCall3 ()
> > #6 0x080c4f95 in index_build ()
> > #7 0x080c68eb in index_create ()
> > #8 0x08117e36 in DefineIndex ()
>
> Hmph. gdb is lying to you, because slot_getattr doesn't call
> slot_attisnull.
> This isn't too unusual in a non-debug build, because the symbol table is
> incomplete (no mention of non-global functions).
>
> Given that this doesn't happen right away, but only after it's been
> processing for awhile, we can assume that FormIndexDatum has been
> successfully iterated many times already, which seems to eliminate
> theories like the slot or the keycol value being bogus. I'm pretty well
> convinced now that we're looking at a problem with corrupted data. Can
> you do a SELECT * FROM (or COPY FROM) the table without error?
>
> regards, tom lane

The statement "copy gslog_event to stdout;" leads to "ERROR: invalid memory alloc request size 4294967293" after awhile.

(...)
354964834 2006-07-19 10:53:42.813+00 (...)
354964835 2006-07-19 10:53:44.003+00 (...)
ERROR: invalid memory alloc request size 4294967293

I tried then "select * from gslog_event where gslog_event_id >= 354964834 and gslog_event_id <= 354964900;":

354964834 | 2006-07-19 10:53:42.813+00 | (...)
354964835 | 2006-07-19 10:53:44.003+00 | (...)
354964837 | 2006-07-19 10:53:44.113+00 | (...)
354964838 | 2006-07-19 10:53:44.223+00 | (...)
(...)
(66 rows)

The statement "select * from gslog_event;" leads to "Killed"... Ouch! The psql client just exits (the postgres server crashes too)!

The statement "select * from gslog_event where gslog_event_id <= 354964834;" passed.

I did other tests on some other tables that contain less data but that seem also corrupted:

copy player to stdout
ERROR: invalid memory alloc request size 1918988375

select * from player where id >=771042 and id<=771043;
ERROR: invalid memory alloc request size 1918988375

select max(length(username)) from player;
ERROR: invalid memory alloc request size 1918988375

select max(length(username)) from player where id <= 771042;
max
-----
15

select max(length(username)) from player where id >= 771050;
max
-----
15

select max(length(username)) from player where id >= 771044 and id <= 771050;
max
-----
13

Finally:

select * from player where id=771043;
ERROR: invalid memory alloc request size 1918988375

select id from player where id=771043;
id
--------
771043
(1 row)

agora=> select username from player where id=771043;
ERROR: invalid memory alloc request size 1918988375

I'm also pretty much convinced that there are some corrupted data, especially varchar row. Before dropping corrupted rows, is there a way to read part of corrupted data?

Thanks Tom for your great support. I'm just afraid that I wasted your time... Anyway I'll write a FAQ that provides some information about this kind of problem we have faced.

Regards,

--
Daniel

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jeff Frost 2006-07-28 16:21:42 mrtg xact/sec plugin
Previous Message Mario Splivalo 2006-07-28 14:11:20 Postgres replication solutions

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2006-07-28 16:31:44 Re: [SQL] PostgreSQL server terminated by signal 11
Previous Message Michael Fuhr 2006-07-28 14:13:49 Re: return setof records