From: | digoal(at)126(dot)com |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #14575: Standby recovery process call close() very slow, when drop (many small files) database on Primary. |
Date: | 2017-03-03 02:32:46 |
Message-ID: | 20170303023246.25054.66379@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 14575
Logged by: Zhou Digoal
Email address: digoal(at)126(dot)com
PostgreSQL version: 9.6.2
Operating system: CentOS 6.x x64
Description:
HI,
I have a user case, which will create many table (and index) in
PostgreSQL, about 3000*150 tables, and 2.25 million small file in
database.
I also use stream replication create a physical standby.
And in this case, user will create and drop database/schema daily, so
postgresql must create and drop about 3000*150 tables, and 2.25 million
small file in database every day.
in this case, i found postgresql standby will delay, when checkpoint, when
drop database, when drop schema.
I use perf, pstack see standby's startup, checkpoint process, it do
close() syscall very slow.
but why checkpoint and drop database|schema so fast in primary?
Best regards, digoal.
test case:
1\.
```
create database tmp1 with template template0;
```
2\.
```
\c tmp1
create table public.test(
c0 serial unique check(c0>0),
c1 serial unique check(c1>0),
c2 serial unique check(c2>0),
c3 serial unique check(c3>0),
c4 serial ,
c5 serial ,
c6 serial ,
c7 serial ,
c8 serial ,
c9 serial ,
c10 serial ,
c11 serial ,
c12 serial ,
c13 serial ,
c14 serial ,
c15 serial ,
c16 serial ,
c17 serial ,
c18 serial ,
c19 serial ,
c20 serial ,
c21 serial ,
c22 serial ,
c23 serial ,
c24 serial ,
c25 serial ,
c26 serial ,
c27 serial ,
c28 serial ,
c29 serial ,
c30 serial ,
c31 serial ,
c32 serial ,
c33 serial ,
c34 serial ,
c35 serial ,
c36 serial ,
c37 serial ,
c38 serial ,
c39 serial
);
```
3\.
```
for ((i=1;i<=3000;i++)) ; do psql tmp1 -c "create schema s$i" ; done
```
this step , standby no delay.
```
select
pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_insert_location(),sent_location)),
pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_insert_location(),replay_location))
from pg_stat_replication ;
\watch 1
```
4\. create table now
```
vi test.sh
for ((x=1;x<=150;x++)) ; do (for ((i=1;i<=3000;i++)) ; do psql tmp1 -c
"create table if not exists s$i.test$x (like public.test including all)" ;
done) & done
chmod 500 test.sh
nohup ./test.sh >/dev/null 2>&1 &
```
this step, standby also no delay.
```
select count(*) from pg_class;
2.25 million record.
```
```
\l+
52 GB
```
5\. now checkpoint in primary
```
checkpoint;
very quick end;
```
on standby :
checkpoint process CPU 100%,run about 7 hours.
strace -p checkpoint, many close() syscall, and very slow.
perf
```
perf record -avg -p checkpoint PID
perf report --stdio
99.74% postgres postgres [.] remove_from_unowned_list
|
--- remove_from_unowned_list
smgrclose
smgrcloseall
CheckpointerMain
AuxiliaryProcessMain
StartChildProcess
sigusr1_handler
__restore_rt
PostmasterMain
startup_hacks
__libc_start_main
```
and bgwriter perf
```
99.78% postgres postgres [.] remove_from_unowned_list
|
--- remove_from_unowned_list
smgrclose
smgrcloseall
BackgroundWriterMain
AuxiliaryProcessMain
StartChildProcess
sigusr1_handler
__restore_rt
PostmasterMain
startup_hacks
__libc_start_main
```
but checkpoint don't conflict with startup, so postgresql standby don't
delay now.
```
close(149466) = 0
close(137674) = 0
close(895987) = 0
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x7facea633e3c) = 217400424
close(438148) = 0
close(541088) = 0
```
this step , i found when i use very large
max_files_per_process=1024000,checkpointer process in standby can quick stop
( restart standby ).
6\. create another big database (many small files)
```
create database test with template tmp1;
```
this time, standby also no delay .
this is standby's startup process strace result.
```
lstat("base/16384/12976402", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
open("base/16384/12976402", O_RDONLY) = 12
open("base/24795111/12976402", O_RDWR|O_CREAT|O_EXCL, 0600) = 13
read(12, "", 65536) = 0
close(13) = 0
close(12) = 0
lstat("base/16384/305700", {st_mode=S_IFREG|0600, st_size=8192, ...}) = 0
open("base/16384/305700", O_RDONLY) = 12
open("base/24795111/305700", O_RDWR|O_CREAT|O_EXCL, 0600) = 13
read(12, "\0\0\0\0\30p\234+\0\0\0\0000\0\360\37\360\37\4
\0\0\0\0b1\5\0\2\0\0\0"..., 65536) = 8192
write(13, "\0\0\0\0\30p\234+\0\0\0\0000\0\360\37\360\37\4
\0\0\0\0b1\5\0\2\0\0\0"..., 8192) = 8192
sync_file_range(0xd, 0, 0x2000, 0x2) = 0
read(12, "", 65536) = 0
close(13) = 0
close(12) = 0
lstat("base/16384/639437", {st_mode=S_IFREG|0600, st_size=8192, ...}) = 0
```
7\. drop one schema
```
\c test
drop schema s1 cascade;
```
standby's startup process very slow, delay .
8\. drop one database
```
create database test1 with template tmp1;
drop database test;
drop database test1;
quick on primary
```
standby will delay now, startup process very slow, and CPU 100%,
```
99.76% postgres postgres [.] remove_from_unowned_list
|
--- remove_from_unowned_list
smgrclose
smgrcloseall
XLogDropDatabase
dbase_redo
StartupXLOG
StartupProcessMain
AuxiliaryProcessMain
StartChildProcess
PostmasterMain
startup_hacks
__libc_start_main
```
From | Date | Subject | |
---|---|---|---|
Next Message | from-postgresql | 2017-03-03 17:56:26 | BUG #14576: JDBC's PreparedStatement.getParameterMetaData fails with 'could not determine polymorphic type' |
Previous Message | Denise Wiedl | 2017-03-02 23:50:33 | Re: BUG #14573: lateral joins, ambuiguity |