| From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
|---|---|
| To: | 第108次明天 <87326549(at)qq(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC) |
| Date: | 2025-12-18 08:40:56 |
| Message-ID: | c4e25cb1ce97f866915a9451b9e15fbe7672457f.camel@cybertec.at |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Thu, 2025-12-18 at 14:42 +0800, 第108次明天 wrote:
> Environment Information
> * PostgreSQL Version: 14.5
> * OS: Debian GNU/Linux 11 (bullseye)
> * Architecture: aarch64 (8-core CPU)
> * Memory: 8GB
> * Storage Medium: eMMC
> Problem Description
> Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649).
> Queries targeting this record fail with:
> ERROR: invalid memory alloc request size 18446744073709551613
> Corruption details of the record:
> * Only the guid field remains valid;
> * All int-type fields (e.g., upload_status) are set to 0;
> * All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;
> * String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null).
> Investigation Details 1. Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).
> 2. Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir),
> but the business app does NOT depend on this library — low corruption probability.
> 3. eMMC Storage Analysis:
> - Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0,
> confirming physical data corruption).
> 4. fstrim Check:
> - fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.
> - Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.
> Impact * Business failure: Query errors trigger business alerts.
> * Backup failure: Database backups are affected by corrupted data.
> Additional Notes * No system/app/PG restarts during the anomaly window.
> * No writes to tb_workstation_message after 2025-12-01 05:34:00.
> * No online operations/terminals between 05:34:00 ~ 09:02:31.
> Request:
> We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community
> help analyze the root cause, or guide further troubleshooting?
"fstrim" should not destroy data...
I am wondering how this problem can lead to backup failure - unless you are using "pg_dump" for backups.
You probably figured out that you will have to delete the row to get rid of the problem.
In addition, you should dump and restore the database to a new cluster - who knows what
other data corruption is lurking there.
I don't know what caused your problem, but I would suspect hardware failure.
Perhaps you should check or replace the disk.
Yours,
Laurenz Albe
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Richard Guo | 2025-12-18 08:56:08 | Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized) |
| Previous Message | 第108次明天 | 2025-12-18 06:42:56 | [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC) |