Re: [HACKERS] kqueue

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Mateusz Guzik <mjguzik(at)gmail(dot)com>, Keith Fiske <keith(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Marko Tiikkaja <marko(at)joh(dot)to>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: [HACKERS] kqueue
Date: 2018-09-30 23:09:43
Message-ID: CAEepm=1YhBEH9FV_76k5GqzZcK4G+PF7_EqGc4eiMKswFtOYRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 30, 2018 at 9:49 PM Matteo Beccati <php(at)beccati(dot)com> wrote:
> On 30/09/2018 04:36, Thomas Munro wrote:
> > On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php(at)beccati(dot)com> wrote:
> >> Out of curiosity, I've installed FreBSD on an identically specced VM,
> >> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> >> unpatched master, so maybe there's something wrong I'm doing when
> >> benchmarking. Could you please provide proper instructions?
> >
> > Ouch. What kind of virtualisation is this? Which version of FreeBSD?
> > Not sure if it's relevant, but do you happen to see gettimeofday()
> > showing up as a syscall, if you truss a backend running pgbench?
>
> I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro.
>
> Yes, I saw plenty of gettimeofday calls when running truss:
>
> > gettimeofday({ 1538297117.071344 },0x0) = 0 (0x0)
> > gettimeofday({ 1538297117.071743 },0x0) = 0 (0x0)
> > gettimeofday({ 1538297117.072021 },0x0) = 0 (0x0)

Ok. Those syscalls show up depending on your
kern.timecounter.hardware setting and virtualised hardware: just like
on Linux, gettimeofday() can be a cheap userspace operation (vDSO)
that avoids the syscall path, or not. I'm not seeing any reason to
think that's relevant here.

> > getpid() = 766 (0x2fe)
> > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0)
> > gettimeofday({ 1538297117.072944 },0x0) = 0 (0x0)
> > getpid() = 766 (0x2fe)
> > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)

That's setproctitle(). Those syscalls go away if you use FreeBSD 12
(which has setproctitle_fast()). If you fix both of those problems,
you are left with just:

> > sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47)
> > recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33)

These are the only syscalls I see for each pgbench -S transaction on
my bare metal machine: just the network round trip. The funny thing
is ... there are almost no kevent() calls.

I managed to reproduce the regression (~70k -> ~50k) using a prewarmed
scale 10 select-only pgbench with 2GB of shared_buffers (so it all
fits), with -j 96 -c 96 on an 8 vCPU AWS t2.2xlarge running FreeBSD 12
ALPHA8. Here is what truss -c says, capturing data from one backend
for about 10 seconds:

syscall seconds calls errors
sendto 0.396840146 3452 0
recvfrom 0.415802029 3443 6
kevent 0.000626393 6 0
gettimeofday 2.723923249 24053 0
------------- ------- -------
3.537191817 30954 6

(There's no regression with -j 8 -c 8, the problem is when
significantly overloaded, the same circumstances under which Matheusz
reported a great improvement). So... it's very rarely accessing the
kqueue directly... but its existence somehow slows things down.
Curiously, when using poll() it's actually calling poll() ~90/sec for
me:

syscall seconds calls errors
sendto 0.352784808 3226 0
recvfrom 0.614855254 4125 916
poll 0.319396480 916 0
gettimeofday 2.659035352 22456 0
------------- ------- -------
3.946071894 30723 916

I don't know what's going on here. Based on the reports so far, we
know that kqueue gives a speedup when using bare metal with pgbench
running on a different machine, but a slowdown when using
virtualisation and pgbench running on the same machine (and I just
checked that that's observable with both Unix sockets and TCP
sockets). That gave me the idea of looking at pgbench itself:

Unpatched:

syscall seconds calls errors
ppoll 0.004869268 1 0
sendto 16.489416911 7033 0
recvfrom 21.137606238 7049 0
------------- ------- -------
37.631892417 14083 0

Patched:

syscall seconds calls errors
ppoll 0.002773195 1 0
sendto 16.597880468 7217 0
recvfrom 25.646406008 7238 0
------------- ------- -------
42.247059671 14456 0

I don't know why the existence of the kqueue should make recvfrom()
slower on the pgbench side. That's probably something to look into
off-line with some FreeBSD guru help. Degraded performance for
clients on the same machine does seem to be a show stopper for this
patch for now. Thanks for testing!

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-09-30 23:19:01 Re: [HACKERS] Optional message to user when terminating/cancelling backend
Previous Message David Rowley 2018-09-30 22:54:27 Re: executor relation handling