Re: Vacuum & Memory Exhausted error

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Spiegelberg <gspiegelberg(at)cranel(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Vacuum & Memory Exhausted error
Date: 2003-06-17 17:07:19
Message-ID: 13740.1055869639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Greg Spiegelberg <gspiegelberg(at)cranel(dot)com> writes:
> I have isolated the table but can't seem to find the info on finding
> the bad row and removing it.

Well, it's not rocket science, you just do trial and error to see which
rows you can select without getting the error. I'd try a SELECT
COUNT(*) first to check that there is no corruption of tuple headers.
If that works, identify which column contains the damage by seeing
whether you can do SELECT max(col) for each column left-to-right.
Then identify the broken row by doing
SELECT broken-col FROM table OFFSET n LIMIT 1
for various values of n --- this reads and discards n rows then reads
and returns another, so if it doesn't fail then the first n+1 rows are
good. If you understand the principle of binary search you can home
in on the target row quickly. Actually it's a little bit complicated
because of an undocumented fact: in current releases the LIMIT mechanism
actually reads one more row than it needs to, so when you have narrowed
down the exact n at which it first fails, the broken row is actually
identifiable by
SELECT ctid FROM table OFFSET n+1 LIMIT 1
Double-check that you have identified the right row by verifying that
SELECT * FROM table WHERE ctid = 'ctid obtained above'
blows up --- if not, you're off by one in the LIMIT stuff.

Once you have found the broken row, learn what you can from it (with
luck you can select at least the first few columns) and then delete it
by ctid.

This all assumes that there is exactly one point of corruption, which is
a really bad assumption when dealing with real cases. Keep in mind that
there is likely to be more than one broken row, and that some of the
corruption may show only as incorrect values and not anything that
provokes an error report. Once you can select all the data in the
table, do what you can to validate your data.

BTW, "ctid" is the physical location of a row, which is expressed in the
form '(block number, line number)'. Once you have determined which
block(s) contain broken data, it would be interesting to dump them out
with a tool like pg_filedump (see http://sources.redhat.com/rhdb/).
The pattern of wrong data might possibly suggest something about the
cause.

regards, tom lane

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message JR Richards 2003-06-17 18:18:57 Database Encryption
Previous Message Donald Fraser 2003-06-17 16:58:24 Redhat 9.0