Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject: Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-09-29 06:04:12
Message-ID: CAB7nPqSeFiDcWsrGNbeDx=AxCibRYfiOH4Ag+DQtmhQL=SVRog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, Sep 29, 2017 at 6:39 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Peter Geoghegan wrote:
>
>> We certainly do still see wrong answers to queries here:
>>
>> postgres=# select ctid, xmin, xmax, * from t;
>> ctid | xmin | xmax | id | name | x
>> -------+-------+------+----+------+---
>> (0,1) | 21171 | 0 | 1 | 111 | 0
>> (0,7) | 21177 | 0 | 3 | 333 | 5
>> (2 rows)
>>
>> postgres=# select * from t where id = 3;
>> id | name | x
>> ----+------+---
>> 3 | 333 | 5
>> (1 row)
>>
>> postgres=# set enable_seqscan = off;
>> SET
>> postgres=# select * from t where id = 3;
>> id | name | x
>> ----+------+---
>> (0 rows)
>
> Yeah, oops.

This really looks like a problem at heap-level with the parent
redirection not getting defined (?). Please note that a subsequent
REINDEX fails as well:
=# reindex index t_pkey ;
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597

VACUUM FREEZE also is not getting things right, but a VACUUM FULL does.

Also, dropping the constraint and attempting to recreate it is failing:
=# alter table t drop constraint t_pkey;
ALTER TABLE
=# create index t_pkey on t(id);
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597

A corrupted page clearly indicates that there are no tuple redirections:
=# select lp, t_ctid, lp_off, t_infomask, t_infomask2 from
heap_page_items(get_raw_page('t', 0));
lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
1 | (0,1) | 8152 | 2818 | 3
2 | null | 0 | null | null
3 | (0,4) | 8112 | 9986 | 49155
4 | (0,5) | 8072 | 9986 | 49155
5 | (0,6) | 8032 | 9986 | 49155
6 | (0,7) | 7992 | 9986 | 49155
7 | (0,7) | 7952 | 11010 | 32771
(7 rows)

And a non-corrupted page clearly shows the redirection done with lp_off:
lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
1 | (0,1) | 8152 | 2818 | 3
2 | null | 7 | null | null
3 | null | 0 | null | null
4 | null | 0 | null | null
5 | null | 0 | null | null
6 | null | 0 | null | null
7 | (0,7) | 8112 | 11010 | 32771
(7 rows)
--
Michael

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2017-09-29 12:58:15 pgsql: Add PostgreSQL version to coverage output
Previous Message Peter Geoghegan 2017-09-28 22:24:47 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-29 06:06:51 Re: Bug with pg_basebackup and 'shared' tablespace
Previous Message Pavan Deolasee 2017-09-29 05:39:00 Re: pgbench stuck with 100% cpu usage