Re: fdatasync performance problem with large number of DB files

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Paul Guo <paulguo(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Brown <michael(dot)brown(at)discourse(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fdatasync performance problem with large number of DB files
Date: 2021-03-18 11:05:11
Message-ID: CA+hUKGJpKUMRqurMCkf+zy1WrH9WMZTWiMPu-JOmpsbsT9UhFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 17, 2021 at 11:42 PM Paul Guo <paulguo(at)gmail(dot)com> wrote:
> I just quickly reviewed the patch (the code part). It looks good. Only
> one concern
> or question is do_syncfs() for symlink opened fd for syncfs() - I'm
> not 100% sure.

Alright, let me try to prove that it works the way we want with an experiment.

I'll make a directory with a file in it, and create a symlink to it in
another filesystem:

tmunro(at)x1:~/junk$ mkdir my_wal_dir
tmunro(at)x1:~/junk$ touch my_wal_dir/foo
tmunro(at)x1:~/junk$ ln -s /home/tmunro/junk/my_wal_dir /dev/shm/my_wal_dir_symlink
tmunro(at)x1:~/junk$ ls /dev/shm/my_wal_dir_symlink/
foo

Now I'll write a program that repeatedly dirties the first block of
foo, and calls syncfs() on the containing directory that it opened
using the symlink:

tmunro(at)x1:~/junk$ cat test.c
#define _GNU_SOURCE

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int
main()
{
int symlink_fd, file_fd;

symlink_fd = open("/dev/shm/my_wal_dir_symlink", O_RDONLY);
if (symlink_fd < 0) {
perror("open1");
return EXIT_FAILURE;
}

file_fd = open("/home/tmunro/junk/my_wal_dir/foo", O_RDWR);
if (file_fd < 0) {
perror("open2");
return EXIT_FAILURE;
}

for (int i = 0; i < 4; ++i) {
if (pwrite(file_fd, "hello world", 10, 0) != 10) {
perror("pwrite");
return EXIT_FAILURE;
}
if (syncfs(symlink_fd) < 0) {
perror("syncfs");
return EXIT_FAILURE;
}
sleep(1);
}
return EXIT_SUCCESS;
}
tmunro(at)x1:~/junk$ cc test.c
tmunro(at)x1:~/junk$ ./a.out

While that's running, to prove that it does what we want it to do,
I'll first find out where foo lives on the disk:

tmunro(at)x1:~/junk$ /sbin/xfs_bmap my_wal_dir/foo
my_wal_dir/foo:
0: [0..7]: 242968520..242968527

Now I'll trace the writes going to block 242968520, and start the program again:

tmunro(at)x1:~/junk$ sudo btrace /dev/nvme0n1p2 | grep 242968520
259,0 4 93 4.157000669 724924 A W 244019144 + 8 <-
(259,2) 242968520
259,0 2 155 5.158446989 718635 A W 244019144 + 8 <-
(259,2) 242968520
259,0 7 23 6.163765728 724924 A W 244019144 + 8 <-
(259,2) 242968520
259,0 7 30 7.169112683 724924 A W 244019144 + 8 <-
(259,2) 242968520

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-03-18 11:10:41 Re: [HACKERS] Custom compression methods
Previous Message Rahila Syed 2021-03-18 10:51:55 Re: row filtering for logical replication