[BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

From: 第108次明天 <87326549(at)qq(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
Date: 2025-12-18 06:42:56
Message-ID: tencent_BA7C5BFF421FD1965E088C9E12CDD7EB6A06@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

To: pgsql-bugs(at)postgresql(dot)org

Subject: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

Environment Information

PostgreSQL Version: 14.5

OS: Debian GNU/Linux 11 (bullseye)

Architecture: aarch64 (8-core CPU)

Memory: 8GB

Storage Medium: eMMC

Problem Description
Data corruption occurred in the tb_workstation_message&nbsp;table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). Queries targeting this record fail with:

plaintext

ERROR: invalid memory alloc request size 18446744073709551613

Corruption details of the record:

Only the guid&nbsp;field remains valid;

All int-type fields (e.g., upload_status) are set to 0;

All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;

String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE&nbsp;to null).

Timeline

Record insertion: 2025-11-25 21:23:51

Last modification: 2025-11-27 09:41:10

Anomaly window: 2025-12-01 03:00:00 ~ 09:02:31 (record was intact before 03:00:00)

First error detected: 2025-12-01 09:02:31 (triggered by a 30s-interval business cron job querying upload_status = 0)

Error Output (Query Example)
sql

postgres=# select * from tb_workstation_message where guid = 'ce04bd3f-232d-4c7f-9a91-540d0e581649'; ERROR: invalid memory alloc request size 18446744073709551613

Investigation Details

Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).

Dynamic Library Check: libpq.so&nbsp;had environment variable misconfiguration (not pointing to PG install dir), but the business app does NOT depend on this library — low corruption probability.

eMMC Storage Analysis:

Used dd&nbsp;to extract the corrupted block; hex analysis shows only guid&nbsp;bytes are valid (others are 0, confirming physical data corruption).

fstrim Check:

fstrim&nbsp;executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.

Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.

Impact

Business failure: Query errors trigger business alerts.

Backup failure: Database backups are affected by corrupted data.

Additional Notes

No system/app/PG restarts during the anomaly window.

No writes to tb_workstation_message&nbsp;after 2025-12-01 05:34:00.

No online operations/terminals between 05:34:00 ~ 09:02:31.

Request:

We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community help analyze the root cause, or guide further troubleshooting?

第108次明天
87326549(at)qq(dot)com


从QQ邮箱发来的超大附件

postgresql_2025-12-01_000000(1).csv (739.5KB, 2026年1月17日 14:42) 进入下载页面 :https://wx.mail.qq.com/ftn/download?func=3&k=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&key=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&code=9259cf38&from=

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2025-12-18 08:40:56 Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
Previous Message Tom Lane 2025-12-17 21:37:24 Re: BUG #19358: Short circuit optimization exists in generic plan but missed in custom plan