Re: Using ProcSignal to get memory context stats from a running backend

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using ProcSignal to get memory context stats from a running backend
Date: 2017-12-21 07:55:35
Message-ID: CAMsr+YGh+sso5N6Q+FmYHLWC=BPCzA+5GbhYZSGruj2d0c7Vvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 December 2017 at 15:24, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> > There tons of callers to enlargeStringInfo, so a 'noerror' parameter
> would
> > be viable.
>
> Not sure what you mean with that sentence?
>

Mangled in editing and sent prematurely, disregard.

There are NOT tons of callers to enlargeStringInfo, so adding a parameter
that allowed it to return a failure result rather than ERROR on OOM seems
to be a reasonable option. But it relies on repalloc(), which will ERROR on
OOM. AFAICS we don't have "no error" variants of the memory allocation
routines and I'm not eager to add them. Especially since we can't trust
that we're not on misconfigured Linux where the OOM killer will go wild
instead of giving us a nice NULL result.

So I guess that means we should probably just do individual elog(...)s and
live with the ugliness of scraping the resulting mess out of the logs.
After all, a log destination that could possibly copy and modify the string
being logged a couple of times it's not a good idea to try to drop the
whole thing into the logs in one blob. And we can't trust things like
syslog with large messages.

I'll follow up with a revised patch that uses individual elog()s soon.

BTW, I also pgindented it in my tree, so it'll have formatting fixed up.
pgindent's handling of

static void printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

upsets me a little, since if I break the line it mangles it into

static void
printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

so it'll break line-length rules a little, but otherwise conform.

> But I'm not convinced it's worth it personally. If we OOM in response to a
> > ProcSignal request for memory context output, we're having pretty bad
> luck.
> > The output is 8k in my test. But even if it were a couple of hundred kb,
> > happening to hit OOM just then isn't great luck on modern systems with
> many
> > gigabytes of RAM.
>
> I've seen plenty memory dumps in the dozens to hundreds of
> megabytes. And imo such cases are more likely to invite use of this
> facility.
>

OK, that's more concerning then. Also impressively huge.

As written it's using MemoryContextStatsDetail(TopMemoryContext, 100) so it
shouldn't really be *getting* multimegabyte dumps, but I'm not thrilled by
hardcoding that. It doesn't merit a GUC though, so I'm not sure what to do
about it and figured I'd make it a "later" problem.

> > If that *does* happen, repalloc(...) will call
> > MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
> > our memory context dump anyway, albeit to stderr.
>
> That would still abort the query that might otherwise continue to work,
> so that seems no excuse.

Eh. It'd probably die soon enough anyway. But yeah, I agree it's not safe
to hope we can sling around up to a couple of hundred MB under presumed
memory pressure.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikael Kjellström 2017-12-21 07:56:20 Re: !<space>= should give error?
Previous Message Andres Freund 2017-12-21 07:24:07 Re: Using ProcSignal to get memory context stats from a running backend