Annoying corruption in PostgreSQL.

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Annoying corruption in PostgreSQL.
Date: 2023-10-27 12:19:27
Message-ID: CALdSSPhmqoN02ciT4UxS6ax0N84NpRwPWm87nKJ_+0G-Na8qOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

We run a large amount of PostgreSQL clusters in our production. They differ
by versions (we have 11-16 pg), load, amount of data, schema, etc. From
time to time, postgresql corruption happens. It says
ERROR,XX001,"missing chunk number 0 for toast value 18767319 in
pg_toast_2619",,,,,,"vacuum full ;"

in logs. the missing chunk number almost every is equal to zero, while
other values vary. There are no known patterns, which triggers this issue.
Moreover, if trying to rerun the VACUUM statement against relations from a
log message, it succeeds all the time. So, we just ignore these errors.
Maybe it is just some wierd data race?

We don't know how to trigger this problem, or why it occurs. I'm not asking
you to resolve this issue, but to help with debugging. What can we do to
deduct failure reasons? Maybe we can add more logging somewhere (we can
deploy a special patched PostgreSQL version everywhere), to have more
information about the issue, when it happens next time?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Wagner 2023-10-27 12:20:49 Enderbury Island disappeared from timezone database
Previous Message Andrew Dunstan 2023-10-27 12:14:52 Re: run pgindent on a regular basis / scripted manner