Re: failure in 019_replslot_limit

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: failure in 019_replslot_limit
Date: 2024-02-09 18:59:15
Message-ID: 20240209185915.btlqlp6of3zc6qxi@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-02-09 18:00:01 +0300, Alexander Lakhin wrote:
> I've managed to reproduce this issue (which still persists:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-02-04%2001%3A53%3A44
> ) and saw that it's not checkpointer, but walsender is hanging:

How did you reproduce this?

> And I see the walsender process still running (I've increased the timeout
> to keep the test running and to connect to the process in question), with
> the following stack trace:
> #0  0x00007fe4feac3d16 in epoll_wait (epfd=5, events=0x55b279b70f38,
> maxevents=1, timeout=timeout(at)entry=-1) at
> ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x000055b278b9ab32 in WaitEventSetWaitBlock
> (set=set(at)entry=0x55b279b70eb8, cur_timeout=cur_timeout(at)entry=-1,
> occurred_events=occurred_events(at)entry=0x7ffda5ffac90,
> nevents=nevents(at)entry=1) at latch.c:1571
> #2  0x000055b278b9b6b6 in WaitEventSetWait (set=0x55b279b70eb8,
> timeout=timeout(at)entry=-1,
> occurred_events=occurred_events(at)entry=0x7ffda5ffac90,
> nevents=nevents(at)entry=1, wait_event_info=wait_event_info(at)entry=100663297) at
> latch.c:1517
> #3  0x000055b278a3f11f in secure_write (port=0x55b279b65aa0,
> ptr=ptr(at)entry=0x55b279bfbd08, len=len(at)entry=21470) at be-secure.c:296
> #4  0x000055b278a460dc in internal_flush () at pqcomm.c:1356
> #5  0x000055b278a461d4 in internal_putbytes (s=s(at)entry=0x7ffda5ffad3c "E\177", len=len(at)entry=1) at pqcomm.c:1302

So it's the issue that we wait effectively forever to to send a FATAL. I've
previously proposed that we should not block sending out fatal errors, given
that allows clients to do prevent graceful restarts and a lot of other things.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2024-02-09 19:02:08 Re: glibc qsort() vulnerability
Previous Message Andres Freund 2024-02-09 18:50:53 Re: POC: Extension for adding distributed tracing - pg_tracing