Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342

From: Tambet Matiisen <tambet(dot)matiisen(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342
Date: 2011-03-16 11:10:14
Message-ID: 4D809A96.6050209@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Kevin!

Thanks for your reply. You make me feel that this is more serious than I
thought.

This is development server, that is also used as pre-live server.
Pre-live database is restored from live database dump every night. So
far the errors have been in pre-live database, that's why I didn't worry
too much - it is anyway overwritten every night from backup. Usually the
next day error was gone. I mostly blamed badly timed backup and restore
scripts, although this shouldn't result in errors.

The errors started from 07.09.2010, when I was still running PostgreSQL
8.1. Few examples:

07.09.2010:
Warning: pg_dump: ERROR: could not open relation with OID 339815468
pg_dump: SQL command to dump the contents of table "kannete_read"
failed: PQendcopy() failed. pg_dump: Error message from server: ERROR:
could not open relation with OID 339815468 pg_dump: The command was:
COPY public.kannete_read (yhistu_id, kande_rea_id, kande_id, konto_nr,
alamkonto_nr, deebetsumma, kreeditsumma, deebetsaldo, kreeditsaldo,
alamkonto_deebetsaldo, alamkonto_kreeditsaldo, looja, loomise_aeg,
muutja, muutmise_aeg, kuupaev, kande_nr, kinnitatud, deebetprotsent,
kreeditprotsent) TO stdout; pg_dumpall: pg_dump failed on database
"korteriy_histu", exiting

19.09.2010:
Warning: pg_dump: ERROR: unexpected chunk number 926884437 (expected
514) for toast value 1736426835 pg_dump: SQL command to dump the
contents of table "failid" failed: PQendcopy() failed. pg_dump: Error
message from server: ERROR: unexpected chunk number 926884437 (expected
514) for toast value 1736426835 pg_dump: The command was: COPY
public.failid (faili_id, yhistu_id, perioodi_id, arve_id, dokumendi_id,
tyyp, sisu, laius, korgus, pikkus, faili_nimi, sisu_tyyp, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "arvetest", exiting

24.09.2010:
Warning: pg_dump: socket not open pg_dump: SQL command to dump the
contents of table "failid" failed: PQendcopy() failed. pg_dump: Error
message from server: socket not open pg_dump: The command was: COPY
public.failid (faili_id, yhistu_id, perioodi_id, arve_id, dokumendi_id,
tyyp, sisu, laius, korgus, pikkus, faili_nimi, sisu_tyyp, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "arvetest", exiting

9.11.2010:
Warning: pg_dump: Dumping the contents of table "maaramised" failed:
PQgetCopyData() failed. pg_dump: Error message from server: server
closed the connection unexpectedly This probably means the server
terminated abnormally before or while processing the request. pg_dump:
The command was: COPY public.maaramised (maaramise_id, kululiigi_id,
perioodi_id, yhistu_id, korteri_id, kogus, yhik, hind, summa, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "arvetest", exiting

More recently after I upgraded to 8.4, 11.02.2010:
Warning: pg_dump: SQL command failed pg_dump: Error message from server:
ERROR: compressed data is corrupt pg_dump: The command was: COPY
public.failid (faili_id, yhistu_id, perioodi_id, arve_id, dokumendi_id,
tyyp, sisu, laius, korgus, pikkus, faili_nimi, sisu_tyyp, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "korteriy_histu", exiting

The current error has occurred 3 days in a row - 13-15.03.2011:
Warning: pg_dump: SQL command failed pg_dump: Error message from server:
ERROR: found toasted toast chunk for toast value 260340218 in
pg_toast_260339342 pg_dump: The command was: COPY
public.yhistud_urlcache (id, url, params, sess_id, content) TO stdout;
pg_dumpall: pg_dump failed on database "yhistud", exiting

This time the error is not in pre-live database and therefore it doesn't
go away.

I have not noticed any unusual errors in other services. The server is
also running Subversion, Trac, Apache, Samba, MySQL, Oracle, Tomcat and
so on. PostgreSQL, Subversion, Trac and Apache+PHP are used actively
every day.

Both fsync and full_page_writes are on. OK, I don't have UPS for this
machine, but power has been stable. Current uptime is 32 days, which I
bet is from the last kernel update. I run Debian testing on that machine.

Currently I blame either faulty memory or faulty software RAID driver. I
can easily eliminate the memory cause by running memtest86 for few
hours. But how do I eliminate the software RAID driver? PostgreSQL has
always been solid for me, so I suspect it least, but you never know...

Now, off to buy UPS...

Tambet

On 15.03.2011 19:47, Kevin Grittner wrote:
> "Tambet Matiisen"<tambet(dot)matiisen(at)gmail(dot)com> wrote:
>
>> For a few days I've been getting this error from my nightly backup
>> script:
>>
>> Warning: pg_dump: SQL command failed pg_dump: Error message from
>> server: ERROR: found toasted toast chunk for toast value 260340218
>> in pg_toast_260339342 pg_dump: The command was: COPY
>> public.yhistud_urlcache (id, url, params, sess_id, content) TO
>> stdout; pg_dumpall: pg_dump failed on database "yhistud", exiting
>> Warning: Failed to dump pgsql cluster
>
> So you don't have a current backup, and your database is corrupted.
>
> (1) If you still have a backup from before you started getting
> backup failures, keep it safe until everything has settled down and
> is running well for several months.
>
> (2) Stop PostgreSQL and do a full copy of the data directory and
> everything under it to a backup medium or another machine. Keep
> this copy safe for months, too.
>
>> Yesterday I upgraded the server from 8.4.5 to 8.4.7, hoping that
>> this error will go away, but no success.
>
> Newer versions with more bug fixes may be less likely to contain
> bugs which could cause corruption, but an upgrade like that is
> unlikely to "heal" data which is already corrupted.
>
>> I've been getting occasional errors from backup script for several
>> months,
>
> Do you know what those were?
>
>> I have upgraded Linux kernel to 2.6.32, hoping that maybe the
>> problem is in software RAID driver, but no changes, occasionally I
>> still get errors.
>
> Occasionally get what errors?
>
>> I still have to do memory test on the server, but I doubt faulty
>> memories are the problem, because otherwise the server behaves
>> well.
>
> So, no problems other than months of errors on backups? Never any
> OS lockups, power losses, or other abrupt terminations of
> operations?
>
> Also, do you now or have you ever run the database with fsync = off
> or full_page_writes = off?
>
> It is very important to figure out how your data got corrupted;
> otherwise you can't really trust this machine..
>
> -Kevin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2011-03-16 15:09:52 Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342
Previous Message rajesh 2011-03-16 07:10:26 BUG #5933: database restore error