Re: kqueue

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: kqueue
Date: 2016-09-10 01:29:00
Message-ID: CAEepm=3B5P_-aSAGUUHqtN06UezKW-zXxD+jF3KoGPmg_O6wPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 7, 2016 at 12:32 AM, Marko Tiikkaja <marko(at)joh(dot)to> wrote:
> I've tested and reviewed this, and it looks good to me, other than this
> part:
>
> + /*
> + * kevent guarantees that the change list has been processed in the
> EINTR
> + * case. Here we are only applying a change list so EINTR counts as
> + * success.
> + */
>
> this doesn't seem to be guaranteed on old versions of FreeBSD or any other
> BSD flavors, so I don't think it's a good idea to bake the assumption into
> this code. Or what do you think?

Thanks for the testing and review!

Hmm. Well spotted. I wrote that because the man page from FreeBSD 10.3 says:

When kevent() call fails with EINTR error, all changes in the changelist
have been applied.

This sentence is indeed missing from the OpenBSD, NetBSD and OSX man
pages. It was introduced by FreeBSD commit r280818[1] which made
kevent a Pthread cancellation point. I investigated whether it is
also true in older FreeBSD and the rest of the BSD family. I believe
the answer is yes.

1. That commit doesn't do anything that would change the situation:
it just adds thread cancellation wrapper code to libc and libthr which
exits under certain conditions but otherwise lets EINTR through to the
caller. So I think the new sentence is documentation of the existing
behaviour of the syscall.

2. I looked at the code in FreeBSD 4.1[2] (the original kqueue
implementation from which all others derive) and the four modern
OSes[3][4][5][6]. They vary a bit but in all cases, the first place
that can produce EINTR appears to be in kqueue_scan when the
(variously named) kernel sleep routine is invoked, which can return
EINTR or ERESTART (later translated to EINTR because kevent doesn't
support restarting). That comes after all changes have been applied.
In fact it's unreachable if nevents is 0: OSX doesn't call kqueue_scan
in that case, and the others return early from kqueue_scan in that
case.

3. An old email[7] from Jonathan Lemon (creator of kqueue) seems to
support that at least in respect of ancient FreeBSD. He wrote:
"Technically, an EINTR is returned when a signal interrupts the
process after it goes to sleep (that is, after it calls tsleep). So
if (as an example) you call kevent() with a zero valued timespec,
you'll never get EINTR, since there's no possibility of it sleeping."

So if I've understood correctly, what I wrote in the v4 patch is
universally true, but it's also moot in this case: kevent cannot fail
with errno == EINTR because nevents == 0. On that basis, here is a
new version with the comment and special case for EINTR removed.

[1] https://svnweb.freebsd.org/base?view=revision&revision=280818
[2] https://github.com/freebsd/freebsd/blob/release/4.1.0/sys/kern/kern_event.c
[3] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c
[4] https://github.com/IIJ-NetBSD/netbsd-src/blob/master/sys/kern/kern_event.c
[5] https://github.com/openbsd/src/blob/master/sys/kern/kern_event.c
[6] https://github.com/opensource-apple/xnu/blob/master/bsd/kern/kern_event.c
[7] http://marc.info/?l=freebsd-arch&m=98147346707952&w=2

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
kqueue-v5.patch application/octet-stream 14.8 KB

In response to

  • Re: kqueue at 2016-09-06 22:32:59 from Marko Tiikkaja

Responses

  • Re: kqueue at 2016-09-13 13:08:39 from Heikki Linnakangas

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-10 02:15:46 cost_sort() may need to be updated
Previous Message Claudio Freire 2016-09-10 01:21:58 Re: Tuplesort merge pre-reading