Re: Logging parallel worker draught

From: Benoit Lobréau <benoit(dot)lobreau(at)dalibo(dot)com>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Logging parallel worker draught
Date: 2023-10-12 10:01:46
Message-ID: 11e34b80-b0a6-e2e4-1606-1f5077379a34@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/11/23 17:26, Imseih (AWS), Sami wrote:

Thank you for resurrecting this thread.

>> Well, if you read Benoit's earlier proposal at [1] you'll see that he
>> does propose to have some cumulative stats; this LOG line he proposes
>> here is not a substitute for stats, but rather a complement. I don't
>> see any reason to reject this patch even if we do get stats.

I believe both cumulative statistics and logs are needed. Logs excel in
pinpointing specific queries at precise times, while statistics provide
a broader overview of the situation. Additionally, I often encounter
situations where clients lack pg_stat_statements and can't restart their
production promptly.

> Regarding the current patch, the latest version removes the separate GUC,
> but the user should be able to control this behavior.

I created this patch in response to Amit Kapila's proposal to keep the
discussion ongoing. However, I still favor the initial version with the
GUCs.

> Query text is logged when log_min_error_statement > default level of "error".
>
> This could be especially problematic when there is a query running more than 1 Parallel
> Gather node that is in draught. In those cases each node will end up
> generating a log with the statement text. So, a single query execution could end up
> having multiple log lines with the statement text.
> ...
> I wonder if it will be better to accumulate the total # of workers planned and # of workers launched and
> logging this information at the end of execution?

log_temp_files exhibits similar behavior when a query involves multiple
on-disk sorts. I'm uncertain whether this is something we should or need
to address. I'll explore whether the error message can be made more
informative.

[local]:5437 postgres(at)postgres=# SET work_mem to '125kB';
[local]:5437 postgres(at)postgres=# SET log_temp_files TO 0;
[local]:5437 postgres(at)postgres=# SET client_min_messages TO log;
[local]:5437 postgres(at)postgres=# WITH a AS ( SELECT x FROM
generate_series(1,10000) AS F(x) ORDER BY 1 ) , b AS (SELECT x FROM
generate_series(1,10000) AS F(x) ORDER BY 1 ) SELECT * FROM a,b;
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.20", size
122880 => First sort
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.19", size 140000
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.23", size 140000
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.22", size
122880 => Second sort
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp138850.21", size 140000

--
Benoit Lobréau
Consultant
http://dalibo.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-10-12 10:05:53 Re: Use virtual tuple slot for Unique node
Previous Message Alexander Lakhin 2023-10-12 10:00:01 Re: cataloguing NOT NULL constraints