From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Adam Sjøgren <adsj(at)novozymes(dot)com>, Pgsql General list <pgsql-general(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Subject: | Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100 |
Date: | 2018-04-06 03:25:37 |
Message-ID: | CABOikdPe3Zq8VXVH+QWb2Kj6JemFyuz4y91SwmfKdNL=BsHo9w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Apr 6, 2018 at 2:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> adsj(at)novozymes(dot)com (Adam =?utf-8?Q?Sj=C3=B8gren?=) writes:
> >> [... still waiting for the result, I will return with what it said
> >> when the server does ...]
>
> > It did eventually finish, with the same result:
>
> Huh. So what we have here, apparently, is that regular MVCC snapshots
> think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
> think there is more than one. This is darn odd, not least because we
> never do UPDATEs in toast tables, only inserts and deletes, so there
> certainly shouldn't be update chains there.
>
> It seems like you've got some corner case wherein SnapshotToast sees a row
> that isn't visible according to MVCC --- probably a row left over from
> some previous cycle of life. That is, I'm imagining the OID counter
> wrapped around and we've reused a toast OID, but for some reason there's
> still a row in the table with that OID. I'm not sure offhand how we could
> get into such a state. Alvaro, does this ring any bells (remembering that
> this is 9.3)?
>
FWIW one of our support customers reported a very similar TOAST table
corruption issue last week which nearly caused an outage. After a lot of
analysis, I think I've now fully understood the reasons behind the
corruption, the underlying bug(s) and possible remedy. I am currently
working on writing a reproducible test case to demonstrate the problem and
writing the fix. More details on that soon.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2018-04-06 06:24:56 | Re: pg_basebackup or dump for starting replication process |
Previous Message | Alexandre Arruda | 2018-04-06 00:09:23 | ERROR: found multixact from before relminmxid |