Autovacuum worker doesn't immediately exit on postmaster death

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Autovacuum worker doesn't immediately exit on postmaster death
Date: 2020-10-28 18:44:18
Message-ID: CAFh8B=kcdk8k-Y21RfXPu5dX=bgPqJ8TC3p_qxR_ygdBS=JN5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I know, nobody in their mind should do that, but, if the postmaster
process is killed with SIGKILL signal, most backend processes
correctly notice the fact of the postmaster process absence and exit.
There is one exception though, when there are autovacuum worker
processes they are continuing to run until eventually finish and exit.

Steps to reproduce:
1. Initialize the cluster and start it up

2. Create a table and fill it up with some data:
localhost/postgres=# create table foo(id int);
CREATE TABLE
localhost/postgres=# alter table foo set (autovacuum_vacuum_cost_delay = 100);
ALTER TABLE
localhost/postgres=# insert into foo select * from generate_series(1, 10000000);
INSERT 0 10000000
localhost/postgres=# \dt+ foo
List of relations
Schema │ Name │ Type │ Owner │ Persistence │ Size │ Description
────────┼──────┼───────┼──────────┼─────────────┼────────┼─────────────
public │ foo │ table │ postgres │ permanent │ 346 MB │
(1 row)

3. Wait until autovacuum worker process started and kill it:
$ ps auxwwwf | grep [p]ostgres
akukush+ 7728 0.0 0.1 321244 26836 ? S 15:51 0:00
postgres -D data
akukush+ 7730 0.0 0.0 173488 4312 ? Ss 15:51 0:00 \_
postgres: logger
akukush+ 7732 0.0 0.0 321584 8808 ? Ss 15:51 0:00 \_
postgres: checkpointer
akukush+ 7733 0.0 0.4 321376 70688 ? Ss 15:51 0:00 \_
postgres: background writer
akukush+ 7734 0.0 0.0 321244 9780 ? Ss 15:51 0:00 \_
postgres: walwriter
akukush+ 7735 0.0 0.0 321796 6684 ? Ss 15:51 0:00 \_
postgres: autovacuum launcher
akukush+ 7736 0.0 0.0 175608 6224 ? Ss 15:51 0:00 \_
postgres: archiver last was 0000002E000000000000002A
akukush+ 7737 0.0 0.0 175608 4340 ? Ss 15:51 0:00 \_
postgres: stats collector
akukush+ 7738 0.0 0.0 321672 6812 ? Ss 15:51 0:00 \_
postgres: logical replication launcher
akukush+ 7743 0.0 0.0 322460 14624 ? Ss 15:52 0:00 \_
postgres: postgres postgres 127.0.0.1(39130) idle

$ ps auxwwwf | grep [p]ostgres | grep auto
akukush+ 7735 0.0 0.0 321796 6684 ? Ss 15:51 0:00 \_
postgres: autovacuum launcher
akukush+ 10483 1.0 0.0 322432 12472 ? Ss 16:28 0:00 \_
postgres: autovacuum worker postgres

$ kill -9 7728

$ ps auxwwwf | grep [p]ostgres | grep auto
akukush+ 10483 0.7 0.0 322432 12472 ? Ss 16:28 0:00
postgres: autovacuum worker postgres

And here is gdb backtrace:
(gdb) bt
#0 0x00007f6e1c80c0f7 in __GI___select (nfds=nfds(at)entry=0,
readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fff6cf1b580)
at ../sysdeps/unix/sysv/linux/select.c:41
#1 0x000055dbd93ade2d in pg_usleep (microsec=<optimized out>) at
./build/../src/port/pgsleep.c:56
#2 0x000055dbd90b7543 in vacuum_delay_point () at
./build/../src/backend/commands/vacuum.c:2034
#3 0x000055dbd8f5c00d in lazy_scan_heap (aggressive=false,
nindexes=0, Irel=0x0, vacrelstats=<optimized out>,
params=0x55dbdaac7e7c, onerel=<optimized out>) at
./build/../src/backend/access/heap/vacuumlazy.c:1034
#4 heap_vacuum_rel (onerel=<optimized out>, params=0x55dbdaac7e7c,
bstrategy=<optimized out>) at
./build/../src/backend/access/heap/vacuumlazy.c:518
#5 0x000055dbd90b561d in table_relation_vacuum (bstrategy=<optimized
out>, params=0x55dbdaac7e7c, rel=0x7f6e1f3105f0) at
./build/../src/include/access/tableam.h:1460
#6 vacuum_rel (relid=16396, relation=<optimized out>,
params=params(at)entry=0x55dbdaac7e7c) at
./build/../src/backend/commands/vacuum.c:1893
#7 0x000055dbd90b68c5 in vacuum (relations=0x55dbdab38588,
params=params(at)entry=0x55dbdaac7e7c, bstrategy=<optimized out>,
bstrategy(at)entry=0x55dbdaac7f98, isTopLevel=isTopLevel(at)entry=true) at
./build/../src/backend/commands/vacuum.c:449
#8 0x000055dbd8f0777b in autovacuum_do_vac_analyze
(bstrategy=0x55dbdaac7f98, tab=0x55dbdaac7e78) at
./build/../src/backend/postmaster/autovacuum.c:3137
#9 do_autovacuum () at ./build/../src/backend/postmaster/autovacuum.c:2467
#10 0x000055dbd8f07e2a in AutoVacWorkerMain (argv=0x0, argc=0) at
./build/../src/backend/postmaster/autovacuum.c:1694
#11 0x000055dbd91a753a in StartAutoVacWorker () at
./build/../src/backend/postmaster/autovacuum.c:1488
#12 0x000055dbd91b54ca in StartAutovacuumWorker () at
./build/../src/backend/postmaster/postmaster.c:5613
#13 sigusr1_handler (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:5320
#14 <signal handler called>
#15 0x00007f6e1c80c0f7 in __GI___select (nfds=nfds(at)entry=10,
readfds=readfds(at)entry=0x7fff6cf1c7f0, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fff6cf1c750)
at ../sysdeps/unix/sysv/linux/select.c:41
#16 0x000055dbd91b5759 in ServerLoop () at
./build/../src/backend/postmaster/postmaster.c:1703
#17 0x000055dbd91b75e3 in PostmasterMain (argc=17, argv=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:1412
#18 0x000055dbd8f0a3c8 in main (argc=17, argv=0x55dbdaa4aef0) at
./build/../src/backend/main/main.c:210

Fifteen minutes later process 10483 was still alive, but the backtrace
looked a bit different (it finished vacuuming and was doing analyze):
(gdb) bt
#0 0x00007f6e1c80c0f7 in __GI___select (nfds=nfds(at)entry=0,
readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fff6cf1b480)
at ../sysdeps/unix/sysv/linux/select.c:41
#1 0x000055dbd93ade2d in pg_usleep (microsec=<optimized out>) at
./build/../src/port/pgsleep.c:56
#2 0x000055dbd90b7543 in vacuum_delay_point () at
./build/../src/backend/commands/vacuum.c:2034
#3 0x000055dbd9037251 in acquire_sample_rows
(onerel=onerel(at)entry=0x7f6e1f310070, elevel=elevel(at)entry=13,
rows=rows(at)entry=0x7f6e1f2b8048, targrows=targrows(at)entry=30000,
totalrows=totalrows(at)entry=0x7fff6cf1b6d8,
totaldeadrows=totaldeadrows(at)entry=0x7fff6cf1b6e0)
at ./build/../src/backend/commands/analyze.c:1079
#4 0x000055dbd9039d51 in do_analyze_rel
(onerel=onerel(at)entry=0x7f6e1f310070,
params=params(at)entry=0x55dbdaac7e7c, va_cols=va_cols(at)entry=0x0,
acquirefunc=0x55dbd9037110 <acquire_sample_rows>, relpages=44248,
inh=inh(at)entry=false, in_outer_xact=false, elevel=13)
at ./build/../src/backend/commands/analyze.c:522
#5 0x000055dbd903b452 in analyze_rel (relid=<optimized out>,
relation=<optimized out>, params=params(at)entry=0x55dbdaac7e7c,
va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>)
at ./build/../src/backend/commands/analyze.c:263
#6 0x000055dbd90b6884 in vacuum (relations=0x55dbdab38588,
params=params(at)entry=0x55dbdaac7e7c, bstrategy=<optimized out>,
bstrategy(at)entry=0x55dbdaac7f98, isTopLevel=isTopLevel(at)entry=true) at
./build/../src/backend/commands/vacuum.c:466
#7 0x000055dbd8f0777b in autovacuum_do_vac_analyze
(bstrategy=0x55dbdaac7f98, tab=0x55dbdaac7e78) at
./build/../src/backend/postmaster/autovacuum.c:3137
#8 do_autovacuum () at ./build/../src/backend/postmaster/autovacuum.c:2467
#9 0x000055dbd8f07e2a in AutoVacWorkerMain (argv=0x0, argc=0) at
./build/../src/backend/postmaster/autovacuum.c:1694
#10 0x000055dbd91a753a in StartAutoVacWorker () at
./build/../src/backend/postmaster/autovacuum.c:1488
#11 0x000055dbd91b54ca in StartAutovacuumWorker () at
./build/../src/backend/postmaster/postmaster.c:5613
#12 sigusr1_handler (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:5320
#13 <signal handler called>
#14 0x00007f6e1c80c0f7 in __GI___select (nfds=nfds(at)entry=10,
readfds=readfds(at)entry=0x7fff6cf1c7f0, writefds=writefds(at)entry=0x0,
exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7fff6cf1c750)
at ../sysdeps/unix/sysv/linux/select.c:41
#15 0x000055dbd91b5759 in ServerLoop () at
./build/../src/backend/postmaster/postmaster.c:1703
#16 0x000055dbd91b75e3 in PostmasterMain (argc=17, argv=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:1412
#17 0x000055dbd8f0a3c8 in main (argc=17, argv=0x55dbdaa4aef0) at
./build/../src/backend/main/main.c:210

Eventually, after 20-25 minutes the process 10483 exited.

I was able to reproduce it with 13.0 and 12.4, and I believe older
versions are also affected.

Regards,
--
Alexander Kukushkin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-10-28 18:45:30 Re: document pg_settings view doesn't display custom options
Previous Message Tomas Vondra 2020-10-28 18:42:52 Re: cleanup temporary files after crash