Re: Corrupted Data ?

From: Ioana Danes <ioanadanes(at)gmail(dot)com>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupted Data ?
Date: 2016-08-12 15:10:53
Message-ID: CAPg0s+5eio1JM2yEzAF5r+K9jnOYPTCvm9g55eDyvFaOPnOGYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte <folarte(at)peoplecall(dot)com>
wrote:

> CCing to the list...
>
> Thanks

> On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes <ioanadanes(at)gmail(dot)com> wrote:
> >> given 318220 and 318216 are just a bit away ( 4db08/4db0c ), and it
> >> repeats sporadically, have you ruled out ( by having page checksums or
> >> other mechanism ) a potential disk read/write error ?
> >>
> >>
> >> > Also the index is correct on db3 as the record in case (with drawid =
> >> > 318216) is retrieved if I filter by drawid = 318220
> >>
> >> Specially if this happens, you may have some slightly bad disks/ram/
> >> leading to this kind of problems.
> >>
> >
> > Could be. I also had some issues with an rsync between db3 and drdb a
> week
> > ago that did not complete for bigger files (> 200MB) and gave me some
> > corruption messages. Then the system was revbooted and everything seemed
> > fine but apparently it is not.
> > I am planning to drop & create the table from a good backup and if that
> does
> > not fix the issue then I will rebuild the server.
>
> I would check whatever logs you can ( syslog or eventlog, smart log,
> etc.. ) hunting for disk errors ( sometimes they are reported ). This
> kind of problems, with programs as tested as postgres and rsync, tend
> to indicate controller/RAM/disk going bad ( in your case it could be
> caused by a single bit getting flipped in a sector for the data
> portion of the table, and not being propagated either because it
> happened after your sync of drdb or because it was synced from the WAL
> and not the table, or because it was read from the disk cache ).
>
> I agree, unfortunately I did not find any clues about corruption or any
anomalies in the logs.
I will work tonight to rebuild that table and see where I go from there.

Thanks,
ioana

Francisco Olarte.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Edmundo Robles 2016-08-12 15:15:28 Re: Error at dynamic generated copy...
Previous Message Adrian Klaver 2016-08-12 14:49:11 Re: Error at dynamic generated copy...