Re: [HACKERS] kqueue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Rui DeSousa <rui(at)crazybean(dot)net>, Torsten Zuehlsdorff <mailinglists(at)toco-domains(dot)de>, Keith Fiske <keith(at)omniti(dot)com>, Matteo Beccati <php(at)beccati(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Marko Tiikkaja <marko(at)joh(dot)to>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: [HACKERS] kqueue
Date: 2020-01-20 16:44:20
Message-ID: 16202.1579538660@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> I took this patch for a quick spin on macOS. The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread. Can anyone confirm or deny this
> issue? Is it specific to macOS perhaps?

Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD. The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though). Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops. I captured stack traces:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
frame #1: 0x0000000105511533 postgres`CreateWaitEventSet(context=<unavailable>, nevents=<unavailable>) at latch.c:622:19 [opt]
frame #2: 0x0000000105511305 postgres`WaitLatchOrSocket(latch=0x0000000112e02da4, wakeEvents=41, sock=-1, timeout=237000, wait_event_info=83886084) at latch.c:389:22 [opt]
frame #3: 0x00000001054a7073 postgres`CheckpointerMain at checkpointer.c:514:10 [opt]
frame #4: 0x00000001052da390 postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:461:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff6554dbce libsystem_kernel.dylib`kevent + 10
frame #1: 0x0000000105511ddc postgres`WaitEventAdjustKqueue(set=0x00007fc8e8805920, event=0x00007fc8e8805958, old_events=<unavailable>) at latch.c:1034:7 [opt]
frame #2: 0x0000000105511638 postgres`AddWaitEventToSet(set=<unavailable>, events=<unavailable>, fd=<unavailable>, latch=<unavailable>, user_data=<unavailable>) at latch.c:778:2 [opt]
frame #3: 0x0000000105511342 postgres`WaitLatchOrSocket(latch=0x0000000112e030f4, wakeEvents=41, sock=-1, timeout=200, wait_event_info=83886083) at latch.c:397:3 [opt]
frame #4: 0x00000001054a6d69 postgres`BackgroundWriterMain at bgwriter.c:304:8 [opt]
frame #5: 0x00000001052da38b postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:456:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff65549c66 libsystem_kernel.dylib`close + 10
frame #1: 0x0000000105511466 postgres`WaitLatchOrSocket [inlined] FreeWaitEventSet(set=<unavailable>) at latch.c:660:2 [opt]
frame #2: 0x000000010551145d postgres`WaitLatchOrSocket(latch=0x0000000112e03444, wakeEvents=<unavailable>, sock=-1, timeout=5000, wait_event_info=83886093) at latch.c:432 [opt]
frame #3: 0x00000001054b8685 postgres`WalWriterMain at walwriter.c:256:10 [opt]
frame #4: 0x00000001052da39a postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:467:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff655515be libsystem_kernel.dylib`__select + 10
frame #1: 0x00000001056a6191 postgres`pg_usleep(microsec=<unavailable>) at pgsleep.c:56:10 [opt]
frame #2: 0x00000001054abe12 postgres`backend_read_statsfile at pgstat.c:5720:3 [opt]
frame #3: 0x00000001054adcc0 postgres`pgstat_fetch_stat_dbentry(dbid=<unavailable>) at pgstat.c:2431:2 [opt]
frame #4: 0x00000001054a320c postgres`do_start_worker at autovacuum.c:1248:20 [opt]
frame #5: 0x00000001054a2639 postgres`AutoVacLauncherMain [inlined] launch_worker(now=632853327674576) at autovacuum.c:1357:9 [opt]
frame #6: 0x00000001054a2634 postgres`AutoVacLauncherMain(argc=<unavailable>, argv=<unavailable>) at autovacuum.c:769 [opt]
frame #7: 0x00000001054a1ea7 postgres`StartAutoVacLauncher at autovacuum.c:415:4 [opt]

I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-20 16:55:13 Re: [HACKERS] kqueue
Previous Message Tomas Vondra 2020-01-20 16:37:54 Re: SLRU statistics