Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619
Date: 2021-05-16 22:27:48
Message-ID: 20210516222748.h2ooucfwrua7ytzm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-05-16 16:23:02 -0400, Tom Lane wrote:
> And the reason oldestXID contains that is that pg_upgrade applied
> pg_resetwal, which does this:
>
> /*
> * For the moment, just set oldestXid to a value that will force
> * immediate autovacuum-for-wraparound. It's not clear whether adding
> * user control of this is useful, so let's just do something that's
> * reasonably safe. The magic constant here corresponds to the
> * maximum allowed value of autovacuum_freeze_max_age.
> */
> ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
> if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
> ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;

Yea - this is causing quite a few problems... See
https://www.postgresql.org/message-id/20210423234256.hwopuftipdmp3okf%40alap3.anarazel.de

> So it seems like we should do some combination of these things:
>
> 1. Fix FullXidRelativeTo to be a little less trusting. It'd
> probably be sane to make it return FirstNormalTransactionId
> when it'd otherwise produce a wrapped-around FullXid, but is
> there any situation where we'd want it to throw an error instead?

I'm wondering whether we should *always* make it an error, and fix the
places where that causes problems.

> 2. Change pg_resetwal to not do the above. It's not entirely
> apparent to me what business it has trying to force
> autovacuum-for-wraparound anyway, but if it does need to do that,
> can we devise a less klugy method?

Yes, see the above email. I think we really to transport accurate oldest
xid + epoch for pg_upgrade.

> It also seems like some assertions in procarray.c would be a
> good idea. With the attached patch, we get through core
> regression just fine, but the pg_upgrade test fails immediately
> after the "Resetting WAL archives" step.

Agreed.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-05-16 22:35:13 Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619
Previous Message Tom Lane 2021-05-16 22:21:21 Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619