Re: Reviewing freeze map code

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-06-10 19:54:10
Message-ID: CA+TgmobcpoZQo78nSh2aa8or4phY6rk+tmJtWdJ9QsNWuyhb1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2016 at 1:59 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Master, but with an existing standby. So it could be related to
>> hot_standby_feedback or such.
>
> I just managed to trigger it again.
>
>
> #1 0x00007fa1a73778da in __GI_abort () at abort.c:89
> #2 0x00007f9f1395e59c in record_corrupt_item (items=items(at)entry=0x2137be0, tid=0x7f9fb8681c0c)
> at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:612
> #3 0x00007f9f1395ead5 in collect_corrupt_items (relid=relid(at)entry=29449, all_visible=all_visible(at)entry=0 '\000', all_frozen=all_frozen(at)entry=1 '\001')
> at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:572
> #4 0x00007f9f1395f476 in pg_check_frozen (fcinfo=0x7ffe5343a200) at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:292
> #5 0x00000000005fdbec in ExecMakeTableFunctionResult (funcexpr=0x2168630, econtext=0x2168320, argContext=<optimized out>, expectedDesc=0x2168ef0,
> randomAccess=0 '\000') at /home/andres/src/postgresql/src/backend/executor/execQual.c:2211
> #6 0x0000000000616992 in FunctionNext (node=node(at)entry=0x2168210) at /home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:94
> #7 0x00000000005ffdcb in ExecScanFetch (recheckMtd=0x6166f0 <FunctionRecheck>, accessMtd=0x616700 <FunctionNext>, node=0x2168210)
> at /home/andres/src/postgresql/src/backend/executor/execScan.c:95
> #8 ExecScan (node=node(at)entry=0x2168210, accessMtd=accessMtd(at)entry=0x616700 <FunctionNext>, recheckMtd=recheckMtd(at)entry=0x6166f0 <FunctionRecheck>)
> at /home/andres/src/postgresql/src/backend/executor/execScan.c:145
> #9 0x00000000006169e4 in ExecFunctionScan (node=node(at)entry=0x2168210) at /home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:268
>
> the error happened just after I restarted a standby, so it's not
> unlikely to be related to hot_standby_feedback.

After some off-list discussion and debugging, Andres and I have
managed to identify three issues here (so far). Two are issues in the
testing, and one is a data-corrupting bug in the freeze map code.

1. pg_check_visible keeps on using the same OldestXmin for all its
checks even though the real OldestXmin may advance in the meantime.
This can lead to spurious problem reports: pg_check_visible() thinks
that the tuple isn't all visible yet and reports it as corruption, but
in reality there's no problem.

2. pg_check_visible includes the same check for heap-xmin-committed
that vacuumlazy.c uses, but hint bits aren't crash safe, so this could
lead to a spurious trouble report in a scenario involving a crash.

3. vacuumlazy.c includes this code:

if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
MultiXactCutoff, &frozen[nfrozen]))
frozen[nfrozen++].offset = offnum;
else if (heap_tuple_needs_eventual_freeze(tuple.t_data))
all_frozen = false;

That's wrong, because a "true" return value from
heap_prepare_freeze_tuple() means only that it has done *some*
freezing work on the tuple, not that it's done all of the freezing
work that will ever need to be done. So, if the tuple's xmin can be
frozen and is aborted but not older than vacuum_freeze_min_age, then
heap_prepare_freeze_tuple() won't free xmax, but the page will still
be marked all-frozen, which is bad. I think it normally won't matter
because the xmax will probably be hinted invalid anyway, since we just
pruned the page which should have set hint bits everywhere, but if
those hint bits were lost then we'd eventually end up with an
accessible xmax pointing off into space.

My first thought was to just delete the "else" but that would be bad
because we'd fail to set all-frozen immediately in a lot of cases
where we should. This needs a bit more thought than I have time to
give it right now.

(I will update on the status of this open item again no later than
Monday; probably sooner.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-06-10 20:12:48 Re: parallel.c is not marked as test covered
Previous Message Pavel Stehule 2016-06-10 18:59:16 Re: proposal: PL/Pythonu - function ereport