Re: assertion at postmaster start

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: assertion at postmaster start
Date: 2019-06-17 17:28:19
Message-ID: 20190617172819.k3kq6jboo6toe42d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-06-15 12:09:50 -0400, Alvaro Herrera wrote:
> Once in a blue moon I get this assertion failure on server start:
>
> 2019-06-15 12:00:29.650 -04 [30080] LOG: iniciando PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, 64-bit
> 2019-06-15 12:00:29.650 -04 [30080] LOG: escuchando en la dirección IPv4 «127.0.0.1», port 55432
> 2019-06-15 12:00:29.650 -04 [30080] LOG: escuchando en el socket Unix «/tmp/.s.PGSQL.55432»
> 2019-06-15 12:00:29.658 -04 [30956] LOG: el sistema de bases de datos fue apagado en 2019-06-15 12:00:24 -04
> 2019-06-15 12:00:29.659 -04 [30080] LOG: proceso de servidor (PID 30107) terminó con código de salida 15
> 2019-06-15 12:00:29.659 -04 [30080] LOG: terminando todos los otros procesos de servidor activos
> TRAP: FailedAssertion(«!(AbortStartTime == 0)», Archivo: «/pgsql/source/master/src/backend/postmaster/postmaster.c», Línea: 2957)
> Aborted (core dumped)
>
> Apologies for the Spanish -- I cannot readily reproduce this. In
> essence, this shows a normal startup, until suddenly process 30107
> terminates with exit code 15, and then while shutting everything down,
> postmaster hits the aforementioned assertion and terminates.

I assume this is on master as of a few days ago? This doesn't even look
to be *after* a crash-restart? And I assume core files weren't enabled?

> One problem with debugging this is that I don't know what process 30107
> is, since the logs don't mention it.

Hm - it probably can't be that many processes, it looks like 30107 has
to have started pretty soon after the startup process (which IIRC is the
one emitting "el sistema de bases de datos fue apagado en"), and as soon
as that's done 30107 is noticed as having crashed.

Unfortunately, as this appears to be a start in a clean database, we
don't really know which phase of startup this is. There's IIRC no
messages to be expected before "database system is ready to accept
connections" in a clean start.

What is a bit odd is that:

> 2019-06-15 12:00:29.659 -04 [30080] LOG: proceso de servidor (PID 30107) terminó con código de salida 15

comes from:
#. translator: %s is a noun phrase describing a child process, such as
#. "server process"
#: postmaster/postmaster.c:3656
#, c-format
msgid "%s (PID %d) exited with exit code %d"
msgstr "%s (PID %d) terminó con código de salida %d"

#: postmaster/postmaster.c:3301 postmaster/postmaster.c:3321
#: postmaster/postmaster.c:3328 postmaster/postmaster.c:3346
msgid "server process"
msgstr "proceso de servidor"

And "server process" is afaict only used for actual backends, not other
types of processes. But we've not yet seen "database system is ready to
accept accept connections", so IIRC it could only be a "dead_end" type
backend? But we didn't yet see an error from that...

> No idea what is going on.

Seems to indicate a logic error in postmaster's state machine. Perhaps
something related to dead_end processes?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-06-17 17:33:24 Re: Fix up grouping sets reorder
Previous Message Alvaro Herrera 2019-06-17 17:09:24 Re: initial random incompatibility