Corruption during WAL replay

From: Teja Mupparti <tejeswarm(at)hotmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Daniel Wood <hexexpert(at)comcast(dot)net>
Subject: Corruption during WAL replay
Date: 2020-03-23 20:56:59
Message-ID: BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is my *first* attempt to submit a Postgres patch, please let me know if I missed any process or format of the patch (I used this link https://wiki.postgresql.org/wiki/Working_with_Git<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FWorking_with_Git&data=02%7C01%7CTejeswar.Mupparti%40microsoft.com%7C4c16d7b057724947546608d7cf5c9fe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205869073084246&sdata=WWsvd8bxTCk%2FUTs9JHdCHZJ77vIl1hs2z2wN075Kh3s%3D&reserved=0> As reference)

The original bug reporting-email and the relevant discussion is here

https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.de<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F20191207001232.klidxnm756wqxvwx%2540alap3.anarazel.de&data=02%7C01%7CTejeswar.Mupparti%40microsoft.com%7C4c16d7b057724947546608d7cf5c9fe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205869073104237&sdata=eP5sZxAH5%2FI86Vs8MRADM1OyIUhyAEJFMQ7vF6hnl%2Bs%3D&reserved=0>

https://www.postgresql.org/message-id/822113470.250068.1573246011818%40connect.xfinity.com<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F822113470.250068.1573246011818%2540connect.xfinity.com&data=02%7C01%7CTejeswar.Mupparti%40microsoft.com%7C4c16d7b057724947546608d7cf5c9fe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205869073094244&sdata=wBIKVDydp8%2FW0zxd8%2F5nwiB77QnF8qW8I705%2BWAvaB8%3D&reserved=0>

https://www.postgresql.org/message-id/20191206230640.2dvdjpcgn46q3ks2%40alap3.anarazel.de<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F20191206230640.2dvdjpcgn46q3ks2%2540alap3.anarazel.de&data=02%7C01%7CTejeswar.Mupparti%40microsoft.com%7C4c16d7b057724947546608d7cf5c9fe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205869073094244&sdata=pQQlFEa5Deu%2B2BhAFmQTyeyOJJC%2FeBeJOXhCxnYNDt8%3D&reserved=0>

https://www.postgresql.org/message-id/1880.1281020817@sss.pgh.pa.us<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F1880.1281020817%2540sss.pgh.pa.us&data=02%7C01%7CTejeswar.Mupparti%40microsoft.com%7C4c16d7b057724947546608d7cf5c9fe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205869073104237&sdata=lcKA8GJNtNxMqlGKC851hIBplqx00DlsPY3Wdr%2F9iP8%3D&reserved=0>

The crux of the fix is, in the current code, engine drops the buffer and then truncates the file, but a crash before the truncate and after the buffer-drop is causing the corruption. Patch reverses the order i.e. truncate the file and drop the buffer later.

Warm regards,

Teja

Attachment Content-Type Size
bug-fix-patch application/octet-stream 10.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-23 21:00:37 Re: Additional size of hash table is alway zero for hash aggregates
Previous Message Thomas Munro 2020-03-23 20:55:11 Re: weird hash plan cost, starting with pg10