Re: libpq debug log

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "'alvherre(at)alvh(dot)no-ip(dot)org'" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "iwata(dot)aya(at)fujitsu(dot)com" <iwata(dot)aya(at)fujitsu(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "'Kyotaro Horiguchi'" <horikyota(dot)ntt(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq debug log
Date: 2021-03-31 20:15:24
Message-ID: 3304521.1617221724@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"'alvherre(at)alvh(dot)no-ip(dot)org'" <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> So crake failed. The failure is that it doesn't print the DataRow
> messages. That's quite odd. We see this in the trace log:

I think this is a timing problem that's triggered (on some machines)
by force_parallel_mode = regress. Looking at spurfowl's latest
failure of this type, the postmaster log shows

2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG: execute <unnamed>: SELECT 1.0/g FROM generate_series(3, -1, -1) g
2021-03-31 14:34:54.992 EDT [18234:1] ERROR: division by zero
2021-03-31 14:34:54.992 EDT [18234:2] STATEMENT: SELECT 1.0/g FROM generate_series(3, -1, -1) g
2021-03-31 14:34:54.993 EDT [18233:16] 001_libpq_pipeline.pl ERROR: division by zero
2021-03-31 14:34:54.993 EDT [18233:17] 001_libpq_pipeline.pl STATEMENT: SELECT 1.0/g FROM generate_series(3, -1, -1) g
2021-03-31 14:34:54.995 EDT [18216:4] LOG: background worker "parallel worker" (PID 18234) exited with exit code 1
2021-03-31 14:34:54.995 EDT [18233:18] 001_libpq_pipeline.pl LOG: could not send data to client: Broken pipe
2021-03-31 14:34:54.995 EDT [18233:19] 001_libpq_pipeline.pl FATAL: connection to client lost

We can see that the division by zero occurred in a parallel worker.
My theory is that in parallel mode, it's uncertain whether the
error will be reported before or after the "preceding" successful
output rows. So you need to disable parallelism to make this
test case stable.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Isaac Morland 2021-03-31 20:25:15 Re: Idea: Avoid JOINs by using path expressions to follow FKs
Previous Message Mats Kindahl 2021-03-31 20:10:22 RFC: Table access methods and scans