Re: lseek/read/write overhead becomes visible at scale ..

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tobias Oberstein <tobias(dot)oberstein(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: lseek/read/write overhead becomes visible at scale ..
Date: 2017-01-24 18:11:21
Message-ID: 20170124181121.pgk7kqfkq4dd3hpo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-01-24 18:57:47 +0100, Tobias Oberstein wrote:
> Am 24.01.2017 um 18:41 schrieb Andres Freund:
> > On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
> > > The syscall overhead is visible in production too .. I watched PG using perf
> > > live, and lseeks regularily appear at the top of the list.
> >
> > Could you show such perf profiles? That'll help us.
>
> oberstet(at)bvr-sql18:~$ psql -U postgres -d adr
> psql (9.5.4)
> Type "help" for help.
>
> adr=# select * from svc_sqlbalancer.f_perf_syscalls();
> NOTICE: starting Linux perf syscalls sampling - be patient, this can take
> some time ..
> NOTICE: sudo /usr/bin/perf stat -e "syscalls:sys_enter_*" -x ";" -a
> sleep 30 2>&1
> pid | syscall | cnt | cnt_per_sec
> -----+---------------------------------------+---------+-------------
> | syscalls:sys_enter_lseek | 4091584 | 136386
> | syscalls:sys_enter_newfstat | 2054988 | 68500
> | syscalls:sys_enter_read | 767990 | 25600
> | syscalls:sys_enter_close | 503803 | 16793
> | syscalls:sys_enter_newstat | 434080 | 14469
> | syscalls:sys_enter_open | 380382 | 12679
> | syscalls:sys_enter_mmap | 301491 | 10050
> | syscalls:sys_enter_munmap | 182313 | 6077
> | syscalls:sys_enter_getdents | 162443 | 5415
> | syscalls:sys_enter_rt_sigaction | 158947 | 5298
> | syscalls:sys_enter_openat | 85325 | 2844
> | syscalls:sys_enter_readlink | 77439 | 2581
> | syscalls:sys_enter_rt_sigprocmask | 60929 | 2031
> | syscalls:sys_enter_mprotect | 58372 | 1946
> | syscalls:sys_enter_futex | 49726 | 1658
> | syscalls:sys_enter_access | 40845 | 1362
> | syscalls:sys_enter_write | 39513 | 1317
> | syscalls:sys_enter_brk | 33656 | 1122
> | syscalls:sys_enter_epoll_wait | 23776 | 793
> | syscalls:sys_enter_ioctl | 19764 | 659
> | syscalls:sys_enter_wait4 | 17371 | 579
> | syscalls:sys_enter_newlstat | 13008 | 434
> | syscalls:sys_enter_exit_group | 10135 | 338
> | syscalls:sys_enter_recvfrom | 8595 | 286
> | syscalls:sys_enter_sendto | 8448 | 282
> | syscalls:sys_enter_poll | 7200 | 240
> | syscalls:sys_enter_lgetxattr | 6477 | 216
> | syscalls:sys_enter_dup2 | 5790 | 193
>
> <snip>
>
> Note: there isn't a lot of load currently (this is from production).

That doesn't really mean that much - sure it shows that lseek is
frequent, but it doesn't tell you how much impact this has to the
overall workload. For that'd you'd need a generic (i.e. not syscall
tracepoint, but cpu cycle) perf profile, and look in the call graph (via
perf report --children) how much of that is below the lseek syscall.

> > > > I'm much less against this change than Tom, but doing artificial syscall
> > > > microbenchmark seems unlikely to make a big case for using it in
> > >
> > > This isn't a syscall benchmark, but FIO.
> >
> > There's not really a difference between those, when you use fio to
> > benchmark seek vs pseek.
>
> Sorry, I don't understand what you are talking about.

Fio as you appear to have used is a microbenchmark benchmarking
individual syscalls.

> > > > postgres, where it's part of vastly more expensive operations (like
> > > > actually reading data afterwards, exclusive locks, ...).
> > >
> > > PG is very CPU hungry, yes.
> >
> > Indeed - working on it ;)
> >
> >
> > > But there are quite some system related effects
> > > too .. eg we've managed to get down the system load with huge pages (big
> > > improvement).
> >
> > Glad to hear it.
>
> With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs
> down in TLB etc overhead).

I was one of the people working on adding hugepage support to pg, that's
why I was glad ;)

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2017-01-24 18:25:04 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
Previous Message Daniel Verite 2017-01-24 18:08:08 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)