From: | "Imseih (AWS), Sami" <simseih(at)amazon(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Benoit Lobréau <benoit(dot)lobreau(at)dalibo(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: Logging parallel worker draught |
Date: | 2023-10-11 15:26:49 |
Message-ID: | 9E9E69BD-BABB-49BB-8B69-61939179F20D@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> Currently explain ( analyze ) will give you the "Workers Planned"
>> and "Workers launched". Logging this via auto_explain is possible, so I am
>> not sure we need additional GUCs or debug levels for this info.
>>
>> -> Gather (cost=10430.00..10430.01 rows=2 width=8) (actual tim
>> e=131.826..134.325 rows=3 loops=1)
>> Workers Planned: 2
>> Workers Launched: 2
> I don't think autoexplain is a good substitute for the originally
> proposed log line. The possibility for log bloat is enormous. Some
> explain plans are gigantic, and I doubt people can afford that kind of
> log traffic just in case these numbers don't match.
Correct, that is a downside of auto_explain in general.
The logging traffic can be controlled by
auto_explain.log_min_duration/auto_explain.sample_rate/etc.
of course.
> Well, if you read Benoit's earlier proposal at [1] you'll see that he
> does propose to have some cumulative stats; this LOG line he proposes
> here is not a substitute for stats, but rather a complement. I don't
> see any reason to reject this patch even if we do get stats.
> Also, we do have a patch on stats, by Sotolongo and Bonne here [2]. I
Thanks. I will review the threads in depth and see if the ideas can be combined
in a comprehensive proposal.
Regarding the current patch, the latest version removes the separate GUC,
but the user should be able to control this behavior.
Query text is logged when log_min_error_statement > default level of "error".
This could be especially problematic when there is a query running more than 1 Parallel
Gather node that is in draught. In those cases each node will end up
generating a log with the statement text. So, a single query execution could end up
having multiple log lines with the statement text.
i.e.
LOG: Parallel Worker draught during statement execution: workers spawned 0, requested 2
STATEMENT: select (select count(*) from large) as a, (select count(*) from large) as b, (select count(*) from large) as c ;
LOG: Parallel Worker draught during statement execution: workers spawned 0, requested 2
STATEMENT: select (select count(*) from large) as a, (select count(*) from large) as b, (select count(*) from large) as c ;
LOG: Parallel Worker draught during statement execution: workers spawned 0, requested 2
STATEMENT: select (select count(*) from large) as a, (select count(*) from large) as b, (select count(*) from large) as c ;
I wonder if it will be better to accumulate the total # of workers planned and # of workers launched and
logging this information at the end of execution?
Regards,
Sami
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-10-11 16:04:08 | Re: Add null termination to string received in parallel apply worker |
Previous Message | Alvaro Herrera | 2023-10-11 15:14:24 | Re: Add null termination to string received in parallel apply worker |