| From: | TAKATSUKA Haruka <harukat(at)sraoss(dot)co(dot)jp> | 
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: BUG #16172: failure of vacuum file truncation can cause permanent data corruption | 
| Date: | 2019-12-20 02:00:28 | 
| Message-ID: | 20191220110028.471b95ff8b9443046d9603a4@sraoss.co.jp | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
I found moving DropRelFileNodeBuffers() from top to end in function
smgrtruncate() is a proper modification. It passed the regression test
and this reproduction test.
with best regards,
Haruka Takatsuka / SRA OSS, Inc. Japan
On Fri, 20 Dec 2019 10:19:52 +0900
TAKATSUKA Haruka <harukat(at)sraoss(dot)co(dot)jp> wrote:
> I also tested PostgreSQL with the attached patch avoided this data
> corruption. The patch just removes DropRelFileNodeBuffers() from
> smgrtruncate().
> 
> 
> On Thu, 19 Dec 2019 07:14:42 +0000
> PG Bug reporting form <noreply(at)postgresql(dot)org> wrote:
> 
> > The following bug has been logged on the website:
> > 
> > Bug reference:      16172
> > Logged by:          TAKATSUKA Haruka
> > Email address:      harukat(at)sraoss(dot)co(dot)jp
> > PostgreSQL version: 12.1
> > Operating system:   Windows/Linux
> > Description:        
> > 
> > Hello, pgsql hackers,
> > 
> > I found that failure of vacuum file truncation can cause permanent data
> > corruption.
> > I am reporting the reproduce steps below.
> > 
> > In Windows installation, the truncation sometime fails by permission
> > denied error because of anti-virus software. It has caused just ERROR
> > and people have offen dismissed it.
> > 
> > Truncation failure can also make the standby panic with the following
> > messages when replaying Heap2/VISIBLE or Heap2/CLEAN, because truncation
> > wal is emitted even if it doesn't complete actually in the primary.
> > 
> >  WARNING:  page .. of relation base/..../.... does not exist
> >  CONTEXT:  WAL redo at ..... for ....: cutoff xid ... flags ...
> >  PANIC:  WAL contains references to invalid pages
> > 
> > I think truncation failure is to be handled as more severe level.
> > Any thoughts?
> > 
> > with best regards,
> > Haruka Takatsuka / SRA OSS, Inc. Japan
> > 
> > 
> > reproduce steps (PG12)
> > ======================
> > 
> > $ psql -U postgres -d db1
> > Pager usage is off.
> > psql (12.1)
> > Type "help" for help.
> > 
> > db1=# 
> > 
> >   $ gdb -p {its backend process}
> > 
> >   (gdb) b FileTruncate
> >   Breakpoint 1 at 0x73d320: file fd.c, line 2057.
> >   (gdb) c
> >   Continuing.
> > 
> > db1=# SHOW autovacuum;
> >  autovacuum
> > ------------
> >  off
> > (1 row)
> > 
> > db1=# CREATE TABLE t1 (id int primary key, v text);
> > CREATE
> > 
> > db1=# INSERT INTO t1 SELECT g, md5(g::text) FROM generate_series(1, 10000)
> > as g;
> > INSERT 0 10000
> > 
> > db1=# CHECKPOINT;
> > 
> >   Program received signal SIGUSR1, User defined signal 1.
> >   0x00000036caae91a3 in __epoll_wait_nocancel () from /lib64/libc.so.6
> >   (gdb) c
> >   Continuing.
> > 
> > CHECKPOINT
> > 
> > db1=# DELETE FROM t1 WHERE id > 50;
> > DELETE 9950
> > 
> > db1=# VACUUM t1;
> > 
> >   Breakpoint 1, FileTruncate (file=59, offset=8192,
> > wait_event_info=167772175)
> >       at fd.c:2057
> >   2057    {
> >   (gdb) n
> >   2065            returnCode = FileAccess(file);
> >   (gdb) n
> >   2066            if (returnCode < 0)
> >   (gdb) p returnCode = -100
> >   $6 = -100
> >   (gdb) c
> >   Continuing.
> > 
> > ERROR:  could not truncate file "base/16384/16645" to 1 blocks: Success
> > 
> > db1=# SELECT count(*) FROM t1;
> >  count
> > -------
> >   9930
> > (1 row)
> > 
> (snip)
______________________________________________________________________
 高塚 遥  harukat(at)sraoss(dot)co(dot)jp  SRA OSS, Inc. http://www.sraoss.co.jp
 〒171-0022 東京都豊島区南池袋2-32-8 
 TEL: 03-5979-2701  FAX: 03-5979-2702  CellPhone: 080-1292-3396
| Attachment | Content-Type | Size | 
|---|---|---|
| 12stable_move_bufferdrop.diff | text/plain | 1.1 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Lakhin | 2019-12-20 02:40:00 | Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) | 
| Previous Message | TAKATSUKA Haruka | 2019-12-20 01:19:52 | Re: BUG #16172: failure of vacuum file truncation can cause permanent data corruption |