Broken hint bits (freeze)

From: Dmitriy Sarafannikov <dsarafannikov(at)yandex(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Borodin Vladimir <root(at)simply(dot)name>
Subject: Broken hint bits (freeze)
Date: 2017-05-23 17:20:46
Message-ID: DA18C5E1-A115-4C1C-9F7C-E7B9A5F3EBC5@yandex.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi hackers,

We have some problems on our production with hint bits and frozen tuples.
More and more following errors began to appear on master after switchover:
ERROR: 58P01: could not access status of transaction 1952523525
DETAIL: Could not open file "pg_clog/0746": No such file or directory.
LOCATION: SlruReportIOError, slru.c:896

We investigated the problem with pageinspect and found the tuples that are the cause:

xdb311g(master)=# select * from mytable where ctid = '(4,21)';
ERROR: 58P01: could not access status of transaction 1951521353
DETAIL: Could not open file "pg_clog/0745": No such file or directory.
LOCATION: SlruReportIOError, slru.c:896

But the same query successfully executed on replica.

We found some difference in hint bits between master and replica:

xdb311g(master)=# SELECT t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) FROM heap_page_items(get_raw_page(‘mytable',4)) where lp=21;
-[ RECORD 1 ]------------------------------
t_xmin | 1951521353
?column? | 00000000000000000000000000000000

old master, now replica:
xdb311e(replica)=# SELECT t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) FROM heap_page_items(get_raw_page(‘mytable',4)) where lp=21;
-[ RECORD 1 ]------------------------------
t_xmin | 1951521353
?column? | 00000000000000000000001100000000

X’0300’ = HEAP_XMIN_FROZEN according to

#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

xdb311g(master)=# select relfrozenxid from pg_class where relname = ‘mytable';
relfrozenxid
--------------
2266835605
(1 row)

This tuple must be frozen but there are no set bits HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID on master

Another interesting thing that LSN of this page on master and replica are not the same:
xdb311g(master)=# select lsn from page_header(get_raw_page(‘mytable',4));
lsn
---------------
8092/6A26DD08
(1 row)

xdb311e(replica)=# select lsn from page_header(get_raw_page(‘mytable',4));
lsn
---------------
838D/C4A0D280
(1 row)

And LSN on replica is greater that LSN on master (838D/C4A0D280 > 8092/6A26DD08)
How can this be possible?

We wrote a query which returns ctid of frozen tuples, which must be frozen but not actually frozen.

xdb311e(replica)=# select t_ctid from generate_series(0, pg_relation_size(‘mytable')/8192 - 1 ) s(i) left join lateral heap_page_items(get_raw_page(‘mytable',s.i::int)) on true where t_xmin::text::bigint < (select relfrozenxid::text::bigint from pg_class where relname = ‘mytable') and t_infomask & X'0300'::int < 1;
t_ctid
-----------
(400,16)
(2837,71)
(2837,72)
(2837,73)
(2837,75)
(2837,76)
(3042,40)
(4750,80)
(4750,81)
(5214,60)
(5214,65)
(6812,31)
(6912,63)
(7329,8)
(7374,26)
(7374,27)
(16 rows)
Same query on master returns 317 rows.

Our thoughts:
1) We think that it is related to switchover.
2) Any WAL-logged modification of this page on master will replace this page on replica due to full page writes.
And all replicas will have broken hint bits too. It’s dangerous.

Where to dig further?

RHEL6, PostgreSQL 9.6.3, wal_log_hints=off, full_page_writes=on, fsync=on, checksums disabled.
We don’t think that it is any hardware-related problems because this databases started from 9.4
and they survived 2 upgrades with pg_upgrade. And any hardware-related problems was not detected.
Problem appears not only in this shard.
Size of each shard is around 5TB and we can’t provide data.

Regards
Dmitriy Sarafannikov

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-05-23 17:42:41 Re: walsender & parallelism
Previous Message Tom Lane 2017-05-23 13:45:14 Re: Getting server crash after running sqlsmith