Re: [HACKERS] kqueue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Rui DeSousa <rui(at)crazybean(dot)net>, Torsten Zuehlsdorff <mailinglists(at)toco-domains(dot)de>, Keith Fiske <keith(at)omniti(dot)com>, Matteo Beccati <php(at)beccati(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Marko Tiikkaja <marko(at)joh(dot)to>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: [HACKERS] kqueue
Date: 2020-01-20 19:03:29
Message-ID: 1847.1579547009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
>> I took this patch for a quick spin on macOS. The result was that the
>> test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't
>> see any mentions of this anywhere in the thread, but that test is newer
>> than the beginning of this thread. Can anyone confirm or deny this
>> issue? Is it specific to macOS perhaps?

> Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
> HEAD. The core regression tests pass, as do the earlier recovery tests
> (I didn't try a full check-world though). Somewhere early in 017_shm.pl,
> things freeze up with four postmaster-child processes stuck in 100%-
> CPU-consuming loops.

I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
so it's not just macOS.

I now think that the autovac launcher isn't actually stuck in the way
that the other processes are. The ones that are actually consuming
CPU are the checkpointer, bgwriter, and walwriter. On the FreeBSD
box their stack traces are

(gdb) bt
#0 _close () at _close.S:3
#1 0x00000000007b4dd1 in FreeWaitEventSet (set=<optimized out>) at latch.c:660
#2 WaitLatchOrSocket (latch=0x80a1477a8, wakeEvents=<optimized out>, sock=-1,
timeout=<optimized out>, wait_event_info=83886084) at latch.c:432
#3 0x000000000074a1b0 in CheckpointerMain () at checkpointer.c:514
#4 0x00000000005691e2 in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90)
at bootstrap.c:461

(gdb) bt
#0 _fcntl () at _fcntl.S:3
#1 0x0000000800a6cd84 in fcntl (fd=4, cmd=2)
at /usr/src/lib/libc/sys/fcntl.c:56
#2 0x00000000007b4eb5 in CreateWaitEventSet (context=<optimized out>,
nevents=<optimized out>) at latch.c:625
#3 0x00000000007b4c82 in WaitLatchOrSocket (latch=0x80a147b00, wakeEvents=41,
sock=-1, timeout=200, wait_event_info=83886083) at latch.c:389
#4 0x0000000000749ecd in BackgroundWriterMain () at bgwriter.c:304
#5 0x00000000005691dd in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90)
at bootstrap.c:456

(gdb) bt
#0 _kevent () at _kevent.S:3
#1 0x00000000007b58a1 in WaitEventAdjustKqueue (set=0x800e6a120,
event=0x800e6a170, old_events=<optimized out>) at latch.c:1034
#2 0x00000000007b4d87 in AddWaitEventToSet (set=<optimized out>,
events=<error reading variable: Cannot access memory at address 0x10>,
fd=-1, latch=<optimized out>, user_data=<optimized out>) at latch.c:778
#3 WaitLatchOrSocket (latch=0x80a147e58, wakeEvents=41, sock=-1,
timeout=5000, wait_event_info=83886093) at latch.c:410
#4 0x000000000075b349 in WalWriterMain () at walwriter.c:256
#5 0x00000000005691ec in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90)
at bootstrap.c:467

Note that these are just snapshots --- it looks like these processes
are repeatedly creating and destroying WaitEventSets, they're not
stuck inside the kernel.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-01-20 19:04:14 Re: Greatest Common Divisor
Previous Message Jesper Pedersen 2020-01-20 19:01:20 Re: Index Skip Scan