Re: Logging parallel worker draught

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Benoit Lobréau <benoit(dot)lobreau(at)dalibo(dot)com>, "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Logging parallel worker draught
Date: 2024-02-27 14:09:24
Message-ID: 2cccd814-d7b3-4c9a-a66a-6df4c45fd95e@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/27/24 10:55, Benoit Lobréau wrote:
> On 2/25/24 20:13, Tomas Vondra wrote:
>> 1) name of the GUC
> ...
>> 2) logging just the failures provides an incomplete view
>> log_parallel_workers = {none | failures | all}>
>> where "failures" only logs when at least one worker fails to start, and
>> "all" logs everything.
>>
>> AFAIK Sami made the same observation/argument in his last message [1].
>
> I like the name and different levels you propose. I was initially
> thinking that the overall picture would be better summarized (an easier
> to implement) with pg_stat_statements. But with the granularity you
> propose, the choice is in the hands of the DBA, which is great and
> provides more options when we don't have full control of the installation.
>

Good that you like the idea with multiple levels.

I agree pg_stat_statements may be an easier way to get a summary, but it
has limitations too (e.g. no built-in ability to show how the metrics
evolves over time, which is easier to restore from logs). So I think
those approaches are complementary.

>> 3) node-level or query-level?
> ...
>> There's no value in having information about every individual temp file,
>> because we don't even know which node produced it. It'd be perfectly
>> sufficient to log some summary statistics (number of files, total space,
>> ...), either for the query as a whole, or perhaps per node (because
>> that's what matters for work_mem). So I don't think we should mimic this
>> just because log_temp_files does this.
>
> I must admit that, given my (poor) technical level, I went for the
> easiest way.
>

That's understandable, I'd do the same thing.

> If we go for the three levels you proposed, "node-level" makes even less
> sense (and has great "potential" for spam).
>

Perhaps.

> I can see one downside to the "query-level" approach: it might be more
> difficult to to give information in cases where the query doesn't end
> normally. It's sometimes useful to have hints about what was going wrong
> before a query was cancelled or crashed, which log_temp_files kinda does.
>

That is certainly true, but it's not a new thing, I believe. IIRC we may
not report statistics until the end of the transaction, so no progress
updates, I'm not sure what happens if the doesn't end correctly (e.g.
backend dies, ...). Similarly for the temporary files, we don't report
those until the temporary file gets closed, so I'm not sure you'd get a
lot of info about the progress either.

I admit I haven't tried and maybe I don't remember the details wrong.
Might be useful to experiment with this first a little bit ...

>> I don't know how difficult would it be to track/collect this information
>> for the whole query.
>
> I am a worried this will be a little too much for me, given the time and
> the knowledge gap I have (both in C and PostgreSQL internals). I'll try
> to look anyway.
>

I certainly understand that, and I realize it may feel daunting and even
discouraging. What I can promise is that I'm willing to help, either by
suggesting ways to approach the problems, doing reviews, and so on.
Would that be sufficient for you to continue working on this patch?

Some random thoughts/ideas about how to approach this:

- For parallel workers I think this might be as simple as adding some
counters into QueryDesc, and update those during query exec (instead of
just logging stuff directly). I'm not sure if it should be added to
"Instrumentation" or separately.

- I was thinking maybe we could just walk the execution plan and collect
the counts that way. But I'm not sure that'd work if the Gather node
happens to be executed repeatedly (in a loop). Also, not sure we'd want
to walk all plans.

- While log_temp_files is clearly out of scope for this patch, it might
be useful to think about it and how it should behave. We've already used
log_temp_files to illustrate some issues with logging the info right
away, so maybe there's something to learn here ...

- For temporary files I think it'd be more complicated, because we can
create temporary files from many different places, not just in executor,
so we can't simply update QueryDesc etc. Also, the places that log info
about temporary files (using ReportTemporaryFileUsage) only really know
about individual temporary files, so if a Sort or HashJoin creates a
1000 files, we'll get one LOG for each of those temp files. But they're
really a single "combined" file. So maybe we should introduce some sort
of "context" to group those files, and only accumulate/log the size for
the group as a whole? Maybe it'd even allow printing some info about
what the temporary file is for (e.g. tuplestore / tuplesort / ...).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-02-27 14:22:22 Re: abi-compliance-checker
Previous Message Alvaro Herrera 2024-02-27 14:03:32 Re: abi-compliance-checker