Quick Links

Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Adam Sjøgren <adsj(at)novozymes(dot)com>, Pgsql General list <pgsql-general(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>, Harry Ambrose <harry(dot)ambrose(at)gmail(dot)com>
Subject:	Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Date:	2018-04-10 10:42:47
Message-ID:	CABOikdPJUv5EyxRZnJY2oA7EKi=bZdoryXNmUGK3dFAQ08tZdw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Fri, Apr 6, 2018 at 8:55 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
wrote:

>
>
> On Fri, Apr 6, 2018 at 2:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> adsj(at)novozymes(dot)com (Adam =?utf-8?Q?Sj=C3=B8gren?=) writes:
>> >> [... still waiting for the result, I will return with what it said
>> >> when the server does ...]
>>
>> > It did eventually finish, with the same result:
>>
>> Huh. So what we have here, apparently, is that regular MVCC snapshots
>> think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
>> think there is more than one. This is darn odd, not least because we
>> never do UPDATEs in toast tables, only inserts and deletes, so there
>> certainly shouldn't be update chains there.
>>
>> It seems like you've got some corner case wherein SnapshotToast sees a row
>> that isn't visible according to MVCC --- probably a row left over from
>> some previous cycle of life. That is, I'm imagining the OID counter
>> wrapped around and we've reused a toast OID, but for some reason there's
>> still a row in the table with that OID. I'm not sure offhand how we could
>> get into such a state. Alvaro, does this ring any bells (remembering that
>> this is 9.3)?
>>
>
> FWIW one of our support customers reported a very similar TOAST table
> corruption issue last week which nearly caused an outage. After a lot of
> analysis, I think I've now fully understood the reasons behind the
> corruption, the underlying bug(s) and possible remedy. I am currently
> working on writing a reproducible test case to demonstrate the problem and
> writing the fix. More details on that soon.
>
>
I've posted a reproducer and a proposed fix to -hackers [1]

In the particular case that I investigated, a database crash/recovery was
involved. But I think we should be able to create a scenario where OID
wrap-around or a standby promotion triggers the problem. I don't know if
any of that was involved in the cases reported on this thread, but I've a
strong suspicion that the underlying bug is probably the same.

Thanks,
Pavan

[1] https://www.postgresql.org/message-id/CABOikdOgWT2hHkYG3Wwo2cyZJq2zf
s1FH0FgX-%3Dh4OLosXHf9w%40mail.gmail.com

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100 at 2018-04-06 03:25:37 from Pavan Deolasee

Responses

Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100 at 2018-04-10 11:52:20 from Adam =?utf-8?Q?Sj=C3=B8gren?=

Browse pgsql-general by date

	From	Date	Subject
Next Message	Alexandre Arruda	2018-04-10 11:31:20	Re: ERROR: found multixact from before relminmxid
Previous Message	hmidi slim	2018-04-10 10:24:49	Using enum instead of join tables