Re: BUG #16172: failure of vacuum file truncation can cause permanent data corruption

From: TAKATSUKA Haruka <harukat(at)sraoss(dot)co(dot)jp>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16172: failure of vacuum file truncation can cause permanent data corruption
Date: 2019-12-20 01:19:52
Message-ID: 20191220101952.1e07a9d6113896d3be1a31ea@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


I also tested PostgreSQL with the attached patch avoided this data
corruption. The patch just removes DropRelFileNodeBuffers() from
smgrtruncate().

On Thu, 19 Dec 2019 07:14:42 +0000
PG Bug reporting form <noreply(at)postgresql(dot)org> wrote:

> The following bug has been logged on the website:
>
> Bug reference: 16172
> Logged by: TAKATSUKA Haruka
> Email address: harukat(at)sraoss(dot)co(dot)jp
> PostgreSQL version: 12.1
> Operating system: Windows/Linux
> Description:
>
> Hello, pgsql hackers,
>
> I found that failure of vacuum file truncation can cause permanent data
> corruption.
> I am reporting the reproduce steps below.
>
> In Windows installation, the truncation sometime fails by permission
> denied error because of anti-virus software. It has caused just ERROR
> and people have offen dismissed it.
>
> Truncation failure can also make the standby panic with the following
> messages when replaying Heap2/VISIBLE or Heap2/CLEAN, because truncation
> wal is emitted even if it doesn't complete actually in the primary.
>
> WARNING: page .. of relation base/..../.... does not exist
> CONTEXT: WAL redo at ..... for ....: cutoff xid ... flags ...
> PANIC: WAL contains references to invalid pages
>
> I think truncation failure is to be handled as more severe level.
> Any thoughts?
>
> with best regards,
> Haruka Takatsuka / SRA OSS, Inc. Japan
>
>
> reproduce steps (PG12)
> ======================
>
> $ psql -U postgres -d db1
> Pager usage is off.
> psql (12.1)
> Type "help" for help.
>
> db1=#
>
> $ gdb -p {its backend process}
>
> (gdb) b FileTruncate
> Breakpoint 1 at 0x73d320: file fd.c, line 2057.
> (gdb) c
> Continuing.
>
> db1=# SHOW autovacuum;
> autovacuum
> ------------
> off
> (1 row)
>
> db1=# CREATE TABLE t1 (id int primary key, v text);
> CREATE
>
> db1=# INSERT INTO t1 SELECT g, md5(g::text) FROM generate_series(1, 10000)
> as g;
> INSERT 0 10000
>
> db1=# CHECKPOINT;
>
> Program received signal SIGUSR1, User defined signal 1.
> 0x00000036caae91a3 in __epoll_wait_nocancel () from /lib64/libc.so.6
> (gdb) c
> Continuing.
>
> CHECKPOINT
>
> db1=# DELETE FROM t1 WHERE id > 50;
> DELETE 9950
>
> db1=# VACUUM t1;
>
> Breakpoint 1, FileTruncate (file=59, offset=8192,
> wait_event_info=167772175)
> at fd.c:2057
> 2057 {
> (gdb) n
> 2065 returnCode = FileAccess(file);
> (gdb) n
> 2066 if (returnCode < 0)
> (gdb) p returnCode = -100
> $6 = -100
> (gdb) c
> Continuing.
>
> ERROR: could not truncate file "base/16384/16645" to 1 blocks: Success
>
> db1=# SELECT count(*) FROM t1;
> count
> -------
> 9930
> (1 row)
>
(snip)

Attachment Content-Type Size
12stable_dont_drop_buffer.diff text/plain 725 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message TAKATSUKA Haruka 2019-12-20 02:00:28 Re: BUG #16172: failure of vacuum file truncation can cause permanent data corruption
Previous Message Juan José Santamaría Flecha 2019-12-19 20:49:54 Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)