VM map freeze corruption

From: "Wood, Dan" <hexpert(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: VM map freeze corruption
Date: 2018-04-18 02:07:17
Message-ID: 84EBAC55-F06D-4FBE-A3F3-8BDA093CE3E3@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pg_check_frozen() reports corrupted VM freeze state.

Found with one of my stress tests. Simplified to the repro below.

The reason for the 33 rows/pages is that I wanted to test if a 2nd vacuum freeze repaired the situation. I was confounded till I discovered SKIP_PAGES_THRESHOLD was 32.

My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId() returns FRM_NOOP if the MultiXACT locked rows haven't committed. This results in changed=false and totally_frozen=true(as initialized). When this returns to lazy_scan_heap(), no rows are added to the frozen[] array. Yet, tuple_totally_frozen is true. This means the page is marked frozen in the VM, even though the MultiXACT row wasn't left untouch.

A fix to heap_prepare_freeze_tuple() that seems to do the trick is:
else
{
Assert(flags & FRM_NOOP);
+ totally_frozen = false;
}

BASH script repro below:

#!/bin/bash

p="psql -h 127.0.0.1 -p 5432 postgres"

echo "create extension pg_visibility;" | $p

$p <<XXX
drop table t;
create table t (i int primary key, c char(7777));
alter table t alter column c set storage plain;
insert into t select generate_series(0, 32, 1), 'XXX';
XXX

# Start two share lockers in the background
$p <<XXX >/dev/null &
begin;
select i, length(c) from t for share;
select pg_sleep(2);
commit;
XXX

$p <<XXX >/dev/null &
begin;
select i, length(c) from t for share;
select pg_sleep(2);
commit;
XXX

# Freeze while multixact locks are held
echo "vacuum freeze t;" | $p
echo "select pg_check_frozen('t');" | $p

sleep 4; # Wait for share locks to be released

# See if another freeze corrects the problem
echo "vacuum freeze t;" | $p
echo "select pg_check_frozen('t');" | $p

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-18 02:16:52 Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
Previous Message David G. Johnston 2018-04-18 01:47:32 Re: Should we add GUCs to allow partition pruning to be disabled?