Re: Incomplete pg_dump operation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: peter(at)vfemail(dot)net
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Incomplete pg_dump operation
Date: 2010-02-08 23:04:58
Message-ID: 855.1265670298@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

peter(at)vfemail(dot)net writes:
> After su'ing to pgsql, I executed a:

> pg_dump -t xyz database_name > /usr/local/pgsql/db_backups/database_name-Manual-Xyz-Table-Dump

> command, and this message was returned:

> pg_dump: ERROR: could not open relation with OID 2196359751
> pg_dump: SQL command to dump the contents of table "xyz" failed: PQendcopy() failed.
> pg_dump: Error message from server: ERROR: could not open relation with OID 2196359751
> pg_dump: The command was: COPY public.xyz ({various field names}) TO stdout;

> That's the same OID other error messages have complained about.

Hmm. AFAICS, a COPY OUT operation should only try to read the given
table (well, and some system tables, but those have small OIDs). The
idea that comes to mind now is that you have a corrupt TOAST pointer
--- or at least data that the code is taking as a TOAST pointer ---
in the xyz table. If that contained 2196359751 in its va_toastrelid
field, that would produce the observed symptoms.

If this theory is correct then there is one damaged row, or maybe
even just one damaged field in a row, somewhere in xyz. (Or maybe
there is more damage after the first one, but anyway it's probably
somewhat localized.) If you can delete the damaged row(s) then you
should be able to dump the rest of the data, which will be a big
step forward from where you are now.

You should be able to home in on the location of the damaged row by
doing "select * from xyz limit N" for various N and seeing what's
the largest N that doesn't fail. Then "select ctid from xyz
offset N limit 1" should give you the ctid of the damaged row
--- confirm by seeing that "select * from xyz where ctid = 'whatever'"
does fail.

After that, you could try "delete from xyz where ctid = 'whatever'" but
I expect that this will fail just like selecting it does. What you'll
probably have to do is stop the postmaster and manually zero the block
containing the row with dd or similar tool. If you search the postgres
archives for previous discussions of recovering from corrupted data, you
should find lots of details about this type of process. It doesn't
come up often enough for anyone to have tried to automate it though.

It might also be interesting to get a dump of the damaged block
(see pg_filedump) so we can try to get an idea of exactly what
happened.

regards, tom lane

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tim Landscheidt 2010-02-09 00:45:12 Re: Novice PL/pgSQL question and example
Previous Message James Long 2010-02-08 21:44:09 Re: Novice PL/pgSQL question and example