Re: select table indicate missing chunk number 0 for toast value 96635 in pg_toast_2619

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: leo xu <leoxu8703(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: select table indicate missing chunk number 0 for toast value 96635 in pg_toast_2619
Date: 2012-05-03 04:26:59
Message-ID: 12138.1336019219@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

leo xu <leoxu8703(at)gmail(dot)com> writes:
> i see a lot ,"missing chunk number 0 for toast value 96635 in
> pg_toast_2619",,,,,," in background aler log.select * from iclock ,no data
> retrun,indicate missing chunk number 0 for toast value 96635 in
> pg_toast_2619.

There is a known bug that can cause that symptom, but it is fixed in
recent update releases. What PG version are you running? If it's
not at least one of the releases cited below, update.

regards, tom lane

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master [08e261cbc] 2011-11-01 19:49:58 -0400
Branch: REL9_1_STABLE Release: REL9_1_2 [5e4dd5f63] 2011-11-01 19:48:43 -0400
Branch: REL9_0_STABLE Release: REL9_0_6 [7f797d27f] 2011-11-01 19:48:49 -0400
Branch: REL8_4_STABLE Release: REL8_4_10 [b05ce7550] 2011-11-01 19:48:56 -0400
Branch: REL8_3_STABLE Release: REL8_3_17 [7e03d2849] 2011-11-01 19:49:01 -0400
Branch: REL8_2_STABLE Release: REL8_2_23 [b24e6cafc] 2011-11-01 19:49:06 -0400

Fix race condition with toast table access from a stale syscache entry.

If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.

The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.

The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it. (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field. We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.) An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.

Back-patch to all supported versions. The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message leo xu 2012-05-03 04:58:16 Re: select table indicate missing chunk number 0 for toast value 96635 in pg_toast_2619
Previous Message Christophe Pettus 2012-05-03 03:27:28 Re: [sfpug] pg_dump: aborting because of server version mismatch