Re: BUG #16352: Data corruption in few tables

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: KrishnaKanth BN <kittu(dot)kkp(at)gmail(dot)com>
Cc: krishnakanth <krishnakanth_bn(at)infosys(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16352: Data corruption in few tables
Date: 2020-04-15 21:38:16
Message-ID: 20200415213816.qupnu726pct6mryn@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Apr 15, 2020 at 10:39:19PM +0530, KrishnaKanth BN wrote:
>Hi All,
>
>Can you please help us on this issue?
>

Well, this very much sounds like a case of data corruption caused by
storage - either at OS or hardware level. You'll have to dig in dmesg
and other system logs I guess.

>On Wed 8 Apr, 2020, 21:30 PG Bug reporting form, <noreply(at)postgresql(dot)org>
>wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 16352
>> Logged by: KrishnaKanth BN
>> Email address: Krishnakanth_bn(at)infosys(dot)com
>> PostgreSQL version: 9.6.13
>> Operating system: Docker image postgres:9.6.13
>> Description:
>>
>> Our Application is using Postgres Version (Image- postgres:9.6.13) and we
>> are running it as Docker Container with Volume mounted to a NFS storage. We
>> came across a peculiar issue.
>>

Well, I wouldn't be surprised if this was related to NFS.

>> We had enabled unique key constraint on a particular table, it was violated
>> and had duplicate data. When this table was queried in the application, it
>> was resulting in chunk errors. This had major outage in the application.
>>

I very much doubt creating a unique constraint would cause this, even if
the creation failed because of duplicate values. It might have made the
issue visible, but that's a different thing.

>> ERROR: missing chunk number 0 for toast value 241012 in pg_toast_2619
>> STATEMENT: SELECT distinct role_name FROM tapplication_roles INNER JOIN
>> troles ON tapplication_roles.role_id = troles.role_id where user_id in
>> ('user1','value1')
>>

This more of less confirms it has nothing to do with the constaint,
because "missing chunk" refers to TOAST table (we don't have chunks
anywhere else) and that's only for tables, not for indexes.

>> We followed several posts and somehow managed to remove the duplicate rows
>> and reindexed and vacuumed the table. We also had to manually remove
>> duplicate entries in pg_statistic table and reindex and vaccum it.
>>
>> There are couple of other tables in which few rows are corrupted. It is
>> because of a corrupt value in one of the columns.
>>

Well, it'd be useful to actually show us what you did etc.

>> We did not see any error in the container logs to isolate this issue when
>> it
>> first occurred.
>>
>> Can you please throw some light what might have caused the corruption and
>> how can we prevent it?
>>

Most likely some sort of storage issue - not necessarily at hardware
level, but it might be at OS level. Not sure if docker, or maybe it's a
networking / NFS issue.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2020-04-15 23:35:57 Re: BUG #16369: Segmentation Faults and Data Corruption with Generated Columns
Previous Message David G. Johnston 2020-04-15 20:21:25 Re: BUG #16368: Incorrect function inlining in the presence of a window function