visibility map corruption

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: visibility map corruption
Date: 2021-07-04 20:43:50
Message-ID: 87da83168c644fd9aae38f546cc70295@opammb0562.comp.optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

We recently ran into an issue where the visibility map of a relation was corrupt, running Postgres 12.4. The error we'd get when running a SELECT * from this table is:

could not access status of transaction 3704450152
DETAIL: Could not open file "pg_xact/0DCC": No such file or directory.

On the lists I could find several similar reports, but corruption like this could obviously have a very wide range of root causes.. see [1] [2] [3] for example - not all of them have their root cause known.

This particular case was similar to reported cases above, but also has some differences.

The following query returns ~21.000 rows, which indicates something inconsistent between the visibility map and the pd_all_visible flag on the page:

select * from pg_check_frozen('tbl');

Looking at one of the affected pages with pageinspect:

=# SELECT lp,lp_off,lp_flags,lp_len,t_xmin,t_xmax,t_field3,t_ctid,t_infomask2,t_infomask,t_hoff,t_oid FROM heap_page_items(get_raw_page('tbl', 726127));
┌────┬────────┬──────────┬────────┬────────────┬────────────┬──────────┬────────────┬─────────────┬────────────┬────────┬───────┐
│ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_oid │
├────┼────────┼──────────┼────────┼────────────┼────────────┼──────────┼────────────┼─────────────┼────────────┼────────┼───────┤
│ 1 │ 6328 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │ (726127,1) │ 249 │ 8339 │ 56 │ ∅ │
│ 2 │ 4464 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │ (726127,2) │ 249 │ 8339 │ 56 │ ∅ │
│ 3 │ 2600 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │ (726127,3) │ 249 │ 8339 │ 56 │ ∅ │
│ 4 │ 680 │ 1 │ 1920 │ 3704450155 │ 3704450155 │ 1 │ (726127,4) │ 249 │ 8339 │ 56 │ ∅ │
└────┴────────┴──────────┴────────┴────────────┴────────────┴──────────┴────────────┴─────────────┴────────────┴────────┴───────┘

t_infomask shows that HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID bits are both unset.
This pg_visibility() call shows the inconsistency between VM and page, with PD_ALL_VISIBLE=false

=# select * from pg_visibility('tbl', 726127);
┌─────────────┬────────────┬────────────────┐
│ all_visible │ all_frozen │ pd_all_visible │
├─────────────┼────────────┼────────────────┤
│ t │ t │ f │
└─────────────┴────────────┴────────────────┘

Looking at other pages show the same information.
What's interesting is that out of the affected tuples returned by pg_check_frozen, over 99% belong to 1 transaction (3704450155 as seen above) and the remaining few are from one other transaction that occurred at roughly the same time.
I find it hard to believe that this is due to some random bit flipping, because many pages are affected, but the "incorrect" ones are in total only from two specific transactions which occurred at roughly the same time. There were also no server crashes or other known failures around the time of this transaction. I'm not ruling out any other kind of failure still, but I also cannot really explain how this could have happened.

The server has PG12.4 with full_page_writes=on, data_checksums=off. It's a large analytics database. The VM inconsistencies also occur on the streaming replicas.

I realize these cases are pretty rare and hard to debug, but I wanted to share the information I found so far here for reference. Maybe someone has an idea what occurred, or maybe someone in the future finds it useful when he finds something similar.

I have no idea how the inconsistency between VM and PD_ALL_VISIBLE started - from looking through the code I can't really find any way how this could occur. However, for it to lead to the problem described here, I believe there should be *no* SELECT that touches that particular page after the insert/update transaction and before the transaction log gets truncated. If the page is read before the transaction log gets truncated, then the hint bit HEAP_XMIN_COMMITTED will get set and future reads will succeed regardless of tx log truncation. One of the replica's had this happen to it: the affected pages are identical to the primary except that the HEAP_XMIN_COMMITTED flag is set (note that the VM inconsistency is still there on the replica though: PD_ALL_VISIBLE=false even though VM shows that all_frozen=all_visible=true). But I can query these rows on the replica without issues, because it doesn't check the tx log when it sees that HEAP_XMIN_COMMITTED is set.

-Floris

[1] https://postgrespro.com/list/thread-id/2422376
[2] https://postgrespro.com/list/thread-id/2501800
[3] https://postgrespro.com/list/thread-id/2321949

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-07-04 20:51:39 Re: visibility map corruption
Previous Message Fabien COELHO 2021-07-04 20:29:48 Re: rand48 replacement