Re: could not access status of transaction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: chenhj <chjischj(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: could not access status of transaction
Date: 2020-01-06 18:39:35
Message-ID: CA+TgmoYc0cmQKd+ogi=BwRqwnQ21ooSEj0O84wKenxAiPzZT+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 5, 2020 at 11:00 PM chenhj <chjischj(at)163(dot)com> wrote:
> According to above information, the flags of the heap page (163363) with the problem tuple (163363, 9) is 0x0001 (HAS_FREE_LINES), that is, ALL_VISIBLE is not set.
>
> However, according hexdump content of the corresponding vm file, that block(location is 9F88 + 6bit) has set VISIBILITYMAP_ALL_FROZEN and VISIBILITYMAP_ALL_VISIBLE flags. That is, the heap file and the vm file are inconsistent.

That's not supposed to happen, and represents data corruption. Your
previous report of a too-old xmin surviving in the heap is also
corruption. There is no guarantee that both problems have the same
cause, but suppose they do. One possibility is that a write to the
heap page may have gotten lost or undone. Suppose that, while this
page was in shared_buffers, VACUUM came through and froze it, setting
the bits in the VM and later truncating CLOG. Then, suppose that when
that page was evicted from shared_buffers, it didn't really get
written back to disk, or alternatively it did, but then later somehow
the old version reappeared. I think that would produce these symptoms.

I think that bad hardware could cause this, or running two copies of
the server on the same data files at the same time, or maybe some kind
of filesystem-related flakiness, especially if, for example, you are
using a network filesystem like NFS, or maybe a broken iSCSI stack.
There is also no reason it couldn't be a bug in PostgreSQL itself,
although if we lost page writes routinely somebody would surely have
noticed by now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pierre Ducroquet 2020-01-06 18:57:40 Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
Previous Message Tom Lane 2020-01-06 18:27:47 Re: Removing pg_pltemplate and creating "trustable" extensions