Quick Links

TOAST corruption in standby database

From:	Alex Adriaanse <alex(at)oseberg(dot)io>
To:	"pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	TOAST corruption in standby database
Date:	2019-10-24 20:20:00
Message-ID:	SN6PR03MB3598FEA9F556D12A25956D0AA96A0@SN6PR03MB3598.namprd03.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

We have primary and hot standby databases running Postgres 11.3 inside Docker, with their data directories bind-mounted to a reflink-enabled XFS filesystem. The VM is running Debian's 4.19.16-1~bpo9+1 kernel inside an AWS EC2 instance.

I've seen TOAST corruption in one of the standby databases a few months ago in a ~44GB table, so I wiped the database and rebuilt it using pg_basebackup, which eliminated the corruption. This week I've seen corruption pop up again in the same table in one of the standby databases. The other standby database experienced no corruption.

The corrupted table has four columns of types integer, text, text, and jsonb. The corruption happens inside the jsonb column.

The corruption manifests itself as follows in the standby database:

SELECT length(json::text) FROM <table> WHERE identity = '...';
ERROR: missing chunk number 0 for toast value 64265646 in pg_toast_16103925

SELECT ctid, chunk_id, chunk_seq, md5(chunk_data) FROM pg_toast.pg_toast_16103925 WHERE chunk_id = 64265646;
ctid | chunk_id | chunk_seq | md5
------+----------+-----------+-----
(0 rows)

SELECT count(1) FROM pg_toast.pg_toast_16103925 WHERE chunk_id = 64265646;
count
-------
2
(1 row)

Looking up the TOAST block that is supposed to contain this value you can see that the TOAST tuples are missing:

SELECT ctid, chunk_id, chunk_seq, md5(chunk_data) FROM pg_toast.pg_toast_16103925 WHERE ctid IN ('(1793121,1)', '(1793121,2)', '(1793121,3)', '(1793121,4)', '(1793121,5)', '(1793121,6)', '(1793121,7)');
ctid | chunk_id | chunk_seq | md5
-------------+----------+-----------+----------------------------------
(1793121,3) | 41259162 | 0 | 1bff36f306bac135cce9da44dd6d6bbb
(1793121,4) | 41259162 | 1 | b754d688c5c847c7bc519e65741ffef1
(1793121,5) | 41259163 | 0 | 10dfa4f5b3e32188f0b4b28c9be76abe
(1793121,6) | 41259163 | 1 | 7dceb98b2c2f4ac3c72245c58c85323f
(4 rows)

For comparison here are the same queries against the primary database:

SELECT length(json::text) FROM <table> WHERE identity = '...';
length
--------
7817
(1 row)

SELECT ctid, chunk_id, chunk_seq, md5(chunk_data) FROM pg_toast.pg_toast_16103925 WHERE chunk_id = 64265646;
ctid | chunk_id | chunk_seq | md5
-------------+----------+-----------+----------------------------------
(1793121,1) | 64265646 | 0 | a9a2642e8408fc178fb809b86c430997
(1793121,2) | 64265646 | 1 | 371bc2628ac5bfc8b37d32f93d08fefe
(2 rows)

SELECT count(1) FROM pg_toast.pg_toast_16103925 WHERE chunk_id = 64265646;
count
-------
2
(1 row)

SELECT ctid, chunk_id, chunk_seq, md5(chunk_data) FROM pg_toast.pg_toast_16103925 WHERE ctid IN ('(1793121,1)', '(1793121,2)', '(1793121,3)', '(1793121,4)', '(1793121,5)', '(1793121,6)', '(1793121,7)');
ctid | chunk_id | chunk_seq | md5
-------------+----------+-----------+----------------------------------
(1793121,1) | 64265646 | 0 | a9a2642e8408fc178fb809b86c430997
(1793121,2) | 64265646 | 1 | 371bc2628ac5bfc8b37d32f93d08fefe
(1793121,3) | 41259162 | 0 | 1bff36f306bac135cce9da44dd6d6bbb
(1793121,4) | 41259162 | 1 | b754d688c5c847c7bc519e65741ffef1
(1793121,5) | 41259163 | 0 | 10dfa4f5b3e32188f0b4b28c9be76abe
(1793121,6) | 41259163 | 1 | 7dceb98b2c2f4ac3c72245c58c85323f
(6 rows)

Looking at the data file for the TOAST relation, the header data structures in the relevant block seem fine to me, which makes me think this is not caused by filesystem corruption (unless a write silently failed). The second half of that block is identical between the primary and corrupted standby, but in the first half the corrupted standby is missing data.

Standby (corrupted):

# dd if=data/base/18034/16103928.13 bs=8192 skip=89185 count=1 status=none | hexdump -C | head -8
00000000 a3 0e 00 00 48 46 88 0e 00 00 05 00 30 00 58 0f |....HF......0.X.|
00000010 00 20 04 20 00 00 00 00 00 00 00 00 00 00 00 00 |. . ............|
00000020 10 98 e0 0f 98 97 e8 00 a8 8f e0 0f 58 8f 96 00 |............X...|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000f50 00 00 00 00 00 00 00 00 32 b0 0a 01 00 00 00 00 |........2.......|
00000f60 00 00 00 00 1b 00 61 5c 06 00 03 00 02 09 18 00 |......a\........|
00000f70 9b 90 75 02 01 00 00 00 ac 00 00 00 83 9f 64 00 |..u...........d.|

Primary:

# dd if=data/base/18034/16103928.13 bs=8192 skip=89185 count=1 status=none | hexdump -C | head -8
00000000 bd 0e 00 00 08 ad 32 b7 00 00 05 00 30 00 90 04 |......2.....0...|
00000010 00 20 04 20 00 00 00 00 68 87 e0 0f 90 84 a8 05 |. . ....h.......|
00000020 10 98 e0 0f 98 97 e8 00 a8 8f e0 0f 58 8f 96 00 |............X...|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000490 a6 07 7e 02 00 00 00 00 00 00 00 00 1b 00 61 5c |..~...........a\|
000004a0 02 00 03 00 02 09 18 00 ae 9d d4 03 01 00 00 00 |................|
000004b0 d0 0a 00 00 23 25 10 07 88 02 13 0f 2c 04 78 01 |....#%......,.x.|

Based on the above observations it seems to me that occasionally some of the changes aren't replicating to or persisting by the standby database. In the past I've seen some TCP packets get mangled or dropped between our EC2 instances, leading to sudden disconnects. The standby connects to the primary using SSL (sslmode=require sslcompression=1) so I would think if there's any network-level corruption SSL would catch it, causing the connection to fail and reconnect. Outside of any SSL disconnects (which don't happen often), this database is stopped and restarted twice a week so we can clone it (using cp -a --reflink=always).

Any ideas on what might be causing this?

Thanks,

Alex

Responses

Re: TOAST corruption in standby database at 2019-10-26 07:12:27 from Amit Kapila
Re: TOAST corruption in standby database at 2019-10-28 15:57:04 from postgresql_2016@163.com

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-10-24 21:10:27	Re: tuplesort test coverage
Previous Message	Tom Lane	2019-10-24 20:17:58	Re: jsonb_set() strictness considered harmful to data