Re: libpq debug log

From: "'alvherre(at)alvh(dot)no-ip(dot)org'" <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "iwata(dot)aya(at)fujitsu(dot)com" <iwata(dot)aya(at)fujitsu(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq debug log
Date: 2021-03-31 21:21:10
Message-ID: 20210331212110.GA12557@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Mar-31, Tom Lane wrote:

> I think this is a timing problem that's triggered (on some machines)
> by force_parallel_mode = regress. Looking at spurfowl's latest
> failure of this type, the postmaster log shows
>
> 2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG: execute <unnamed>: SELECT 1.0/g FROM generate_series(3, -1, -1) g
> 2021-03-31 14:34:54.992 EDT [18234:1] ERROR: division by zero
> 2021-03-31 14:34:54.992 EDT [18234:2] STATEMENT: SELECT 1.0/g FROM generate_series(3, -1, -1) g
> 2021-03-31 14:34:54.993 EDT [18233:16] 001_libpq_pipeline.pl ERROR: division by zero
> 2021-03-31 14:34:54.993 EDT [18233:17] 001_libpq_pipeline.pl STATEMENT: SELECT 1.0/g FROM generate_series(3, -1, -1) g
> 2021-03-31 14:34:54.995 EDT [18216:4] LOG: background worker "parallel worker" (PID 18234) exited with exit code 1
> 2021-03-31 14:34:54.995 EDT [18233:18] 001_libpq_pipeline.pl LOG: could not send data to client: Broken pipe
> 2021-03-31 14:34:54.995 EDT [18233:19] 001_libpq_pipeline.pl FATAL: connection to client lost
>
> We can see that the division by zero occurred in a parallel worker.
> My theory is that in parallel mode, it's uncertain whether the
> error will be reported before or after the "preceding" successful
> output rows. So you need to disable parallelism to make this
> test case stable.

Ooh, that makes sense, thanks. Added a SET in the program itself to set
it to off.

This is not the *only* issue though; at least animal drongo shows a
completely different failure, where the last few tests don't even get to
run, and the server log just has this:

2021-03-31 20:20:14.460 UTC [178364:4] 001_libpq_pipeline.pl LOG: disconnection: session time: 0:00:00.469 user=pgrunner database=postgres host=127.0.0.1 port=52638
2021-03-31 20:20:14.681 UTC [178696:1] [unknown] LOG: connection received: host=127.0.0.1 port=52639
2021-03-31 20:20:14.687 UTC [178696:2] [unknown] LOG: connection authorized: user=pgrunner database=postgres application_name=001_libpq_pipeline.pl
2021-03-31 20:20:15.157 UTC [178696:3] 001_libpq_pipeline.pl LOG: could not receive data from client: An existing connection was forcibly closed by the remote host.

I suppose the program is crashing for some reason.

--
Álvaro Herrera Valdivia, Chile
"Los dioses no protegen a los insensatos. Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-31 21:40:38 Re: libpq debug log
Previous Message Andres Freund 2021-03-31 21:19:31 Re: What to call an executor node which lazily caches tuples in a hash table?