Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-10-06 21:09:12
Message-ID: 1507324152711.78083@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

>On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> I don't know if it's really the freeze map at fault or something else.
>>
>> Ideally, it would be possible to effectively disable the new freeze
>> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

>Actually, the simplest thing might be to just use pg_visibility's
>pg_check_frozen() to check that the visibility/freeze map accurately
>summarizes the all-frozen status of tuples in the heap. If that
>doesn't indicate that there is corruption, we can be fairly confident
>that the problem is elsewhere. The metadata in the visibility/freeze
>map should be accurate when a bit is set to indicate that an entire
>heap page is all-frozen (or, separately, all-visible). We can hardly
>expect it to have better information that the authoritative source of
>truth, the heap itself.

>The more I think about it, the more I tend to doubt that the remaining
>problems are with the freeze map. If the freeze map was wrong, and
>incorrectly said that a page was all-frozen, then surely the outward
>symptoms would take a long time to show up, as they always do when we
>accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
>that that's the only meaningful way that the freeze map can be wrong
>-- it only promises to be accurate when it says that no further
>freezing is needed for a page/bit.

Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
None of the all-frozen or all-visible bits are necessarily set in problematic pages.

ERROR: failed to find parent tuple for heap-only tuple at (0,182) in table "accounts"

blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
(1 row)

Even when the bits were set, I haven't found issues with the pg_check_xxx functions in the dozens of times I've run them.

postgres=# select * from pg_visibility('accounts'::regclass);
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
1 | t | t | t
(2 rows)

postgres=# select pg_check_visible('accounts'::regclass);
pg_check_visible
------------------
(0 rows)

postgres=# select pg_check_frozen('accounts'::regclass);
pg_check_frozen
-----------------
(0 rows)

Yi Wen

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Geoghegan 2017-10-06 22:15:07 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Robert Haas 2017-10-06 19:33:22 pgsql: Copy information from the relcache instead of pointing to it.

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-10-06 22:15:07 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Badrul Chowdhury 2017-10-06 21:07:51 Re: Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)