Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tambet Matiisen" <tambet(dot)matiisen(at)gmail(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342
Date: 2011-03-16 15:09:52
Message-ID: 4D808C70020000250003B98B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tambet Matiisen <tambet(dot)matiisen(at)gmail(dot)com> wrote:

> Pre-live database is restored from live database dump every night.

How is that done? A single pg_dump of the entire live database
restored using psql? Are both database servers at the same
PostgreSQL version?

> So far the errors have been in pre-live database,

You're running pg_dump against a database you just restored from a
pg_dump image?

> Usually the next day error was gone. I mostly blamed badly timed
> backup and restore scripts, although this shouldn't result in
> errors.

No it shouldn't -- if you're following any of the documented backup
and restore techniques. I have a suspicion that you're just doing a
file copy without stopping the live database or properly following
the documented PITR backup and recovery techniques.

> The errors started from 07.09.2010, when I was still running
> PostgreSQL 8.1. Few examples:
>
> 07.09.2010:
> Warning: pg_dump: ERROR: could not open relation with OID
> 339815468

> [additional errors which could be caused by copying a database
> while running without proper PITR techniques]

> The current error has occurred 3 days in a row - 13-15.03.2011:
> Warning: pg_dump: SQL command failed pg_dump: Error message from
> server:
> ERROR: found toasted toast chunk for toast value 260340218 in
> pg_toast_260339342

> This time the error is not in pre-live database and therefore it
> doesn't go away.

If I understand you, this sounds like corruption in the live
database; nothing on the pre-live database is part of causing this
problem.

> The server is also running [...] Samba [...]

I hope you're not trusting Samba too far. For a while we were using
it in backups across our WAN, and it mangled at least one file
almost every day. We had to take to running md5sum against both
ends for each file to ensure we didn't get garbage (until we
converted everything to use TCP communications, which have never
mangled anything for us).

> Both fsync and full_page_writes are on.

Good. Without those an OS or hardware crash can corrupt your
database.

> OK, I don't have UPS for this machine, but power has been stable.
> Current uptime is 32 days, which I bet is from the last kernel
> update.

OK. A power outage wouldn't be too likely to matter if you have
fsync and full_page_writes on.

> Currently I blame either faulty memory or faulty software RAID
> driver. I can easily eliminate the memory cause by running
> memtest86 for few hours

Is this ECC memory? If not, even a good test doesn't prove that a
RAM problem didn't cause the corruption.

> Now, off to buy UPS...

Not a bad idea, but it doesn't sound like lack of that is likely to
have caused the corruption in your live database, based on the
settings you mentioned. (Assuming those settings are in use on the
live server.)

-Kevin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2011-03-16 15:14:19 Re: BUG #5933: database restore error
Previous Message Tambet Matiisen 2011-03-16 11:10:14 Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342