Re: Adding column "mem_usage" to view pg_prepared_statements

From: Daniel Migowski <dmigowski(at)ikoffice(dot)de>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding column "mem_usage" to view pg_prepared_statements
Date: 2019-08-05 20:46:47
Message-ID: adb82679-8109-fd5c-ede0-12c4b9150e94@ikoffice.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Am 05.08.2019 um 19:16 schrieb Andres Freund:
> On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:
>> how do you want to generalize it? Are you thinking about a view solely
>> for the display of the memory usage of different objects?
> I'm not quite sure. I'm just not sure that adding separate
> infrastructure for various objects is a sutainable approach. We'd likely
> want to have this for prepared statements, for cursors, for the current
> statement, for various caches, ...
>
> I think an approach would be to add an 'owning_object' field to memory
> contexts, which has to point to a Node type if set. A table returning reporting
> function could recursively walk through the memory contexts, starting at
> TopMemoryContext. Whenever it encounters a context with owning_object
> set, it prints a string version of nodeTag(owning_object). For some node
> types it knows about (e.g. PreparedStatement, Portal, perhaps some of
> the caches), it prints additional metadata specific to the type (so for
> prepared statement it'd be something like 'prepared statement', '[name
> of prepared statement]'), and prints information about the associated
> context and all its children.
I understand. So it would be something like the output of
MemoryContextStatsInternal, but in table form with some extra columns. I
would have loved this extra information already in
MemoryContextStatsInternal btw., so it might be a good idea to upgrade
it first to find the information and wrap a table function over it
afterwards.
> The general context information probably should be something like:
> context_name, context_ident,
> context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used, context_total_children
> context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used, context_self_children,
>
> It might make sense to have said function return a row for the contexts
> it encounters that do not have an owner set too (that way we'd e.g. get
> CacheMemoryContext handled), but then still recurse.
A nice way to learn about the internals of the server and to analyze the
effects of memory reducing enhancements.
> Arguably the proposed owning_object field would be a bit redundant with
> the already existing ident/MemoryContextSetIdentifier field, which
> e.g. already associates the query string with the contexts used for a
> prepared statement. But I'm not convinced that's going to be enough
> context in a lot of cases, because e.g. for prepared statements it could
> be interesting to have access to both the prepared statement name, and
> the statement.
The identifier seems to be more like a category at the moment, because
it does not seem to hold any relevant information about the object in
question. So a more specific name would be nice.
> The reason I like something like this is that we wouldn't add new
> columns to a number of views, and lack views to associate such
> information to for some objects. And it'd be disproportional to add all
> the information to numerous places anyway.
I understand your argumentation, but things like Cursors and Portals are
rather short living while prepared statements seem to be the place where
memory really builds up.
> One counter-argument is that it'd be more expensive to get information
> specific to prepared statements (or other object types) that way. I'm
> not sure I buy that that's a problem - this isn't something that's
> likely going to be used at a high frequency. But if it becomes a
> problem, we can add a function that starts that process at a distinct
> memory context (e.g. a function that does this just for a single
> prepared statement, identified by name) - but I'd not start there.
I also see no problem here, and with Konstantin Knizhnik's autoprepare I
wouldn't use this very often anyway, more just for monitoring purposes,
where I don't care if my query is a bit more complex.
>> While being interesting I still believe monitoring the mem usage of
>> prepared statements is a bit more important than that of other objects
>> because of how they change memory consumption of the server without
>> using any DDL or configuration options and I am not aware of other
>> objects with the same properties, or are there some? And for the other
>> volatile objects like tables and indexes and their contents PostgreSQL
>> already has it's information functions.
> Plenty other objects have that property. E.g. cursors. And for the
> catalog/relation/... caches it's even more pernicious - the client might
> have closed all its "handles", but we still use memory (and it's
> absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, and
add another table for the output of MemoryContextStatsDetail? This has
the advantage that the single real memory indicator useful for end users
(to the question: How much mem takes my sh*t up?) is in
pg_prepared_statements and some more intrinsic information in a detail
view.

Thinking about the latter I am against such a table, at least in the
form where it gives information like context_total_freechunks, because
it would just be useful for us developers. Why should any end user care
for how many chunks are still open in a MemoryContext, except when he is
working on C-style extensions. Could just be a source of confusion for
them.

Let's think about the goal this should have: The end user should be able
to monitor the memory consumption of things he's in control of or could
affect the system performance. Should such a table automatically
aggregate some information? I think so. I would not add more than two
memory columns to the view, just mem_used and mem_reserved. And even
mem_used is questionable, because in his eyes only the memory he cannot
use for other stuff because of object x is important for him (that was
the reason I just added one column). He would even ask: WHY is there 50%
more memory reserved than used, and how I can optimize it? (Would lead
to more curious PostgreSQL developers maybe, so that's maybe a plus).

Something that also clearly speaks FOR such a table and against my
proposal is, that if someone cares for memory, he would most likely care
for ALL his memory, and in that case monitoring prepared statements
would just be a small subset of stuff to monitor. Ok, I am defeated and
will rewrite my patch if the next proposal finds approval:

I would propose a table pg_mem_usage containing the columns
object_class, name, detail, mem_usage (rename them if it fits the style
of the other tables more). The name would be empty for some objects like
the unnamed prepared statement, the query strings would be in the detail
column. One could add a final "Other" row containing the mem no specific
output line has been accounted for. Also it could contain lines for
Cursors and other stuff I am to novice to think of here.

And last: A reason why still we need a child-parent-relationship in this
table (and distinct this_ and total_ mem functions), is that prepared
statements start up to use much more memory when the Generic Plan is
stored in it after a few uses. As a user I always had the assumption
that prepared a statement would already do all the required work to be
fast, but a statement just becomes blazingly fast when the Generic Plan
is available (and used), and it would be nice to see for which
statements that plan has already been generated to consume his memory. I
believe the reason for this would be the fear of excessive memory usage.

On the other hand: The Generic Plan had been created for the first
invocation of the prepared statement, why not store it immediatly. It is
a named statement for a reason that it is intended to be reused, even
when it is just twice, and since memory seems not to be seen as a scarce
resource in this context why not store that immediately. Would drop the
need for a hierarchy here also.

Any comments?

Regards,
Daniel Migowski

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-05 21:03:44 Re: Adding column "mem_usage" to view pg_prepared_statements
Previous Message Fabien COELHO 2019-08-05 20:45:53 Re: pgbench - implement strict TPC-B benchmark