From: | "Wood, Dan" <hexpert(at)amazon(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | VM map freeze corruption |
Date: | 2018-04-18 02:07:17 |
Message-ID: | 84EBAC55-F06D-4FBE-A3F3-8BDA093CE3E3@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
pg_check_frozen() reports corrupted VM freeze state.
Found with one of my stress tests. Simplified to the repro below.
The reason for the 33 rows/pages is that I wanted to test if a 2nd vacuum freeze repaired the situation. I was confounded till I discovered SKIP_PAGES_THRESHOLD was 32.
My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId() returns FRM_NOOP if the MultiXACT locked rows haven't committed. This results in changed=false and totally_frozen=true(as initialized). When this returns to lazy_scan_heap(), no rows are added to the frozen[] array. Yet, tuple_totally_frozen is true. This means the page is marked frozen in the VM, even though the MultiXACT row wasn't left untouch.
A fix to heap_prepare_freeze_tuple() that seems to do the trick is:
else
{
Assert(flags & FRM_NOOP);
+ totally_frozen = false;
}
BASH script repro below:
#!/bin/bash
p="psql -h 127.0.0.1 -p 5432 postgres"
echo "create extension pg_visibility;" | $p
$p <<XXX
drop table t;
create table t (i int primary key, c char(7777));
alter table t alter column c set storage plain;
insert into t select generate_series(0, 32, 1), 'XXX';
XXX
# Start two share lockers in the background
$p <<XXX >/dev/null &
begin;
select i, length(c) from t for share;
select pg_sleep(2);
commit;
XXX
$p <<XXX >/dev/null &
begin;
select i, length(c) from t for share;
select pg_sleep(2);
commit;
XXX
# Freeze while multixact locks are held
echo "vacuum freeze t;" | $p
echo "select pg_check_frozen('t');" | $p
sleep 4; # Wait for share locks to be released
# See if another freeze corrects the problem
echo "vacuum freeze t;" | $p
echo "select pg_check_frozen('t');" | $p
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-04-18 02:16:52 | Re: reloption to prevent VACUUM from truncating empty pages at the end of relation |
Previous Message | David G. Johnston | 2018-04-18 01:47:32 | Re: Should we add GUCs to allow partition pruning to be disabled? |