Re: backtrace_on_internal_error

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backtrace_on_internal_error
Date: 2023-12-09 21:06:42
Message-ID: 20231209210642.h3uxp66annsueufh@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-12-09 12:41:30 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> >> I was wondering about that too. But if we do so, why not also do it for
> >> writes?
>
> > Writes don't act that way, do they? EOF on a pipe gives you an error,
> > not silently reporting that zero bytes were written and leaving you
> > to retry indefinitely.
>
> On further reflection I realized that you're right so far as the SSL
> code path goes, because SSL_write() can involve physical reads as well
> as writes, so at least in principle it's possible that we'd see EOF
> reported this way from that function.

Heh. I'll just claim that's what I was thinking about.

> Also, the libpq side does need work of the same sort, leading to the
> v2-0001 patch attached.

I'd perhaps add a comment explaining why it's plausible that we'd see that
that in the write case.

> I also realized that we have more or less the same problem at the
> caller level, allowing a similar failure for non-SSL connections.
> So I'm also proposing 0002 attached. Your results from aggregated
> logs didn't show "could not receive data from client: Success" as a
> common case, but since we weren't bothering to zero errno beforehand,
> it's likely that such failures would show up with very random errnos.

I did only look at the top ~100 internal errors, after trying to filter out
extensions, i.e. the list wasn't exhaustive. There's also very few non-ssl
connections. But I just checked, and for that error message, I do see some
XX000, but only in older versions. There's ENETUNREACH, ECONNRESET,
ETIMEDOUT, EHOSTUNREACH which these days are all handled as non XX000 by
errcode_for_socket_access().

> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
> index bd72a87bbb..76b647ce1c 100644
> --- a/src/interfaces/libpq/fe-secure.c
> +++ b/src/interfaces/libpq/fe-secure.c
> @@ -211,6 +211,8 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
> int result_errno = 0;
> char sebuf[PG_STRERROR_R_BUFLEN];
>
> + SOCK_ERRNO_SET(0);
> +
> n = recv(conn->sock, ptr, len, 0);
>
> if (n < 0)
> @@ -232,6 +234,7 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
>
> case EPIPE:
> case ECONNRESET:
> + case 0: /* treat as EOF */
> libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
> "\tThis probably means the server terminated abnormally\n"
> "\tbefore or while processing the request.");

If we were treating it as EOF, we'd not "queue" an error message, no? Normally
recv() returns 0 in that case, so we'd just return, right?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-12-09 21:39:37 Re: encoding affects ICU regex character classification
Previous Message Sutou Kouhei 2023-12-09 21:01:36 Re: Make COPY format extendable: Extract COPY TO format implementations