Skip site navigation (1) Skip section navigation (2)

Re: Deleted WAL files held open by backends in Linux

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deleted WAL files held open by backends in Linux
Date: 2009-11-30 18:20:12
Message-ID: 4B13B87C020000250002CDC2@gw.wicourts.gov (view raw or flat)
Thread:
Lists: pgsql-hackers
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> It seemed strange that the only backends which were holding open
>> deleted WAL files were ones where the connection was established
>> with a login which has no write permissions.
>
> A backend would never open a WAL file unless it had to write a WAL
> record, so I'm having a hard time believing that these were
> totally read-only transactions.  Can you give specifics?

Here's some information captured before the original post:

MILWAUKEE-PG:~ # lsof | grep deleted
COMMAND     PID    USER   FD      TYPE             DEVICE       SIZE   
   NODE NAME
postgres  13105    ccsa   10u      REG                8,4   16777216 
268442394 /var/pgsql/data/cc/pg_xlog/00000001000000E0000000BB (deleted)
postgres  30500    ccsa    4u      REG                8,4   16777216 
268442626 /var/pgsql/data/cc/pg_xlog/00000001000000E20000003F (deleted)
postgres  30501    ccsa    5u      REG                8,4   16777216 
283509014 /var/pgsql/data/cc/pg_xlog/00000001000000E200000016 (deleted)
MILWAUKEE-PG:~ # ps auxf|grep ^ccsa
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
COMMAND
ccsa     30490  0.0  0.0 290988  8608 ?        S    Oct24   0:13
/usr/local/pgsql-8.3.7/bin/postgres -D /var/pgsql/data/cc
ccsa     30491  0.0  0.0  14524  1056 ?        Ss   Oct24   0:10  \_
postgres: logger process
ccsa     30493  0.0  1.0 291876 268760 ?       Ss   Oct24   1:36  \_
postgres: writer process
ccsa     30494  0.0  0.0 291120  1452 ?        Ss   Oct24   0:01  \_
postgres: wal writer process
ccsa     30495  0.0  0.0 291552  1564 ?        Ss   Oct24   0:00  \_
postgres: autovacuum launcher process
ccsa     30496  0.0  0.0  14612   952 ?        Ss   Oct24   0:01  \_
postgres: archiver process   last was 00000001000000E20000005B
ccsa     30497  0.0  0.0  15280  1576 ?        Ss   Oct24   6:37  \_
postgres: stats collector process
ccsa     30500  0.0  0.9 292448 231844 ?       Ss   Oct24  10:42  \_
postgres: viewer cc 127.0.0.1(36846) idle
ccsa     30501  0.0  0.8 292496 209428 ?       Ss   Oct24  10:28  \_
postgres: viewer cc 165.219.78.11(53940) idle
ccsa      3107  0.0  0.0 292508  5836 ?        Ss   Nov09   0:32  \_
postgres: cc cc 127.0.0.1(40442) idle
ccsa      3113  0.0  1.1 305304 286280 ?       Ss   Nov09   6:51  \_
postgres: cc cc 127.0.0.1(49969) idle
ccsa      3119  0.0  0.0 292508  5836 ?        Ss   Nov09   0:29  \_
postgres: cc cc 127.0.0.1(54127) idle
ccsa     13105  0.0  0.0 293396 23852 ?        Ss   Nov23   0:08  \_
postgres: viewer cc 165.219.90.178(51481) idle
ccsa      1485  0.8  1.1 312400 291508 ?       Ss   Nov24  14:18  \_
postgres: cc cc 127.0.0.1(42692) idle
ccsa     10752  0.3  1.1 312712 290856 ?       Ss   Nov24   4:54  \_
postgres: cc cc 127.0.0.1(45119) idle
ccsa     10908  1.3  1.1 313120 292836 ?       Ss   Nov24  20:03  \_
postgres: cc cc 127.0.0.1(42065) idle
ccsa     25099  0.7  1.1 312864 285048 ?       Ss   14:02   1:03  \_
postgres: cc cc 127.0.0.1(50361) idle
ccsa     26571  0.0  0.3 298912 89528 ?        Ss   14:29   0:06  \_
postgres: cc cc 127.0.0.1(52213) idle
ccsa     26809  0.1  0.3 298940 88816 ?        Ss   14:33   0:06  \_
postgres: cc cc 127.0.0.1(59145) idle
ccsa     26812  0.6  1.1 310060 274192 ?       Ss   14:33   0:44  \_
postgres: cc cc 127.0.0.1(59962) idle
MILWAUKEE-PG:~ # ll /var/pgsql/data/cc/pg_xlog/
total 1671172
-rw------- 1 ccsa users      253 2009-11-20 21:29
00000001000000E00000000A.00001810.backup
-rw------- 1 ccsa users 16777216 2009-11-25 16:10
00000001000000E20000005B
-rw------- 1 ccsa users 16777216 2009-11-25 16:22
00000001000000E20000005C
-rw------- 1 ccsa users 16777216 2009-11-25 04:15
00000001000000E20000005D
-rw------- 1 ccsa users 16777216 2009-11-25 05:15
00000001000000E20000005E
-rw------- 1 ccsa users 16777216 2009-11-25 06:15
00000001000000E20000005F
-rw------- 1 ccsa users 16777216 2009-11-25 07:15
00000001000000E200000060
-rw------- 1 ccsa users 16777216 2009-11-25 07:42
00000001000000E200000061
[...]
-rw------- 1 ccsa users 16777216 2009-11-25 15:34
00000001000000E2000000BB
-rw------- 1 ccsa users 16777216 2009-11-25 15:37
00000001000000E2000000BC
-rw------- 1 ccsa users 16777216 2009-11-25 15:44
00000001000000E2000000BD
-rw------- 1 ccsa users 16777216 2009-11-25 15:50
00000001000000E2000000BE
-rw------- 1 ccsa users 16777216 2009-11-25 15:57
00000001000000E2000000BF
-rw------- 1 ccsa users 16777216 2009-11-25 16:04
00000001000000E2000000C0
drwx------ 2 ccsa users       94 2009-11-25 16:12 archive_status

You will note that the connections logged in as "viewer" (and only
those) are holding open a deleted WAL file.  This user has not been
granted anything except SELECT permissions to any tables.  In
addition, immediately after creating the database, we ran:

  REVOKE CREATE ON DATABASE cc FROM public;
  REVOKE CREATE ON SCHEMA public FROM public;

Two of these connections are from software publishers, which only
issue SELECT statements against three tables (to which they only
have SELECT permission).  The other is from monitoring software,
which only runs SELECT against C functions from our fsutil software
(which can be seen on pgfoundry).  All are JDBC connections.  I
suspect that autoCommit is on; if you think it matters, I can go
review the code to confirm.

On one of our development machines, with a number of PostgreSQL
databases, I found about 60 such WAL files deleted and held open
by backends logged in under the read-only user.  It wasted my time
primarily because I was looking for situations where we might have
deleted a directory tree without first stopping the process using
it; while 60 file handles and 1GB of disk space is no big deal for
us, it caused me to investigate the issue to confirm there weren't
zombie PostgreSQL processes out there.

At a minimum, we should add the extra 16MB per connection that might
be taken on the WAL file system to the calculations people should do
when sizing that, just in case someone is trying to cut that thin
while planning on using a lot of connections.

-Kevin

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2009-11-30 18:29:34
Subject: Re: lexeme ordering in tsvector
Previous:From: Joshua D. DrakeDate: 2009-11-30 18:16:17
Subject: Re: Block-level CRC checks

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group