Re: (full) Memory context dump considered harmful

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: (full) Memory context dump considered harmful
Date: 2015-08-20 21:33:47
Message-ID: 55D647BB.2070207@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 08/20/2015 11:04 PM, Stefan Kaltenbrunner wrote:
> On 08/20/2015 06:09 PM, Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>> I wonder if we should have a default of capping the dump to say 1k lines
>>> or such and only optionally do a full one.
>>
>> -1. It's worked like this for the last fifteen years or
>> thereabouts, and you're the first one to complain. I suspect some
>> weirdness in your logging setup, rather than any systemic problem
>> that we need to lobotomize our diagnostic output in order to
>> prevent.
>
> not sure what you consider weird in the logging setup here - the
> context dump is imho borderline on internal diagnostic output at a
> debug level (rather than making sense to an average sysadmin) already
> (and no way to control it). But having (like in our case) the backend
> dumping 2 million basically identical lines into a general logfile
> per event seems excessive and rather abusive towards the rest of the
> system (just from an IO perspective for example or from a log file
> post processing tool perspective)

I've seen similar issues too, on systems with perfectly common logging
setups. So even if your logging setup would be strange, it's not the
main factor here.

>> (The reason I say "lobotomize" is that there's no particularly
>> good reason to assume that the first N lines will tell you what you
>> need to know. And the filter rule would have to be *very* stupid,
>> because we can't risk trying to allocate any additional memory to
>> track what we're doing here.)
>
> I do understand that there migt be challenges there but in the last
> 15 years machines got way faster and pg got way capable and some of
> those capabilities might need to get revisited in that regards - and
> while it is very nice that pg survives multiple oom cases pretty
> nicely I dont think it is entitled to put an imho unreasonable burden
> on the rest of the system by writing "insane" amounts of data.
>
> Just from a sysadmin perspective it also means that it is trivial for
> a misbehaving app to fill up the logfile on a system because unlike
> almost all other actual other logging output there seems to be no way
> to control/disabled it on a per role/database level.

IMHO the first thing we might do is provide log_memory_stats GUC
controlling that. I'm not a big fan of adding GUC for everything, but in
this case it seems appropriate, just like the other log_ options.

I also don't think logging just subset of the stats is a lost case.
Sure, we can't know which of the lines are important, but for example
logging just the top-level contexts with a summary of the child contexts
would be OK. So something like this:

TopMemoryContext: 582728880 total in 71126 blocks; 6168 free ...
TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 ...
3 child contexts, 10 blocks, 34873 free
PL/pgSQL function context: 8192 total in 1 blocks; 1152 fre ...
PL/pgSQL function context: 24576 total in 2 blocks; 11400 f ...
Type information cache: 24576 total in 2 blocks; 11888 free ...
PL/pgSQL function context: 8192 total in 1 blocks; 1120 fre ...
PL/pgSQL function context: 24576 total in 2 blocks; 10984 f ...
PL/pgSQL function context: 57344 total in 3 blocks; 29928 f ...
PL/pgSQL function context: 57344 total in 3 blocks; 28808 f ...
PL/pgSQL function context: 24576 total in 2 blocks; 5944 fr ...
RI compare cache: 24576 total in 2 blocks; 15984 free (5 ch ...
RI query cache: 24576 total in 2 blocks; 11888 free (5 chun ...
PL/pgSQL function context: 57344 total in 3 blocks; 31832 f ...
PL/pgSQL function context: 57344 total in 3 blocks; 29600 f ...
PL/pgSQL function context: 57344 total in 3 blocks; 39688 f ...
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 65 ...
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free ...
PLpgSQL function cache: 24520 total in 2 blocks; 3744 free ...
Prepared Queries: 125886512 total in 25 blocks; 4764208 fre ...
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chun ...
Operator lookup cache: 24576 total in 2 blocks; 11888 free ...
MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks ...
Operator class cache: 8192 total in 1 blocks; 1680 free (0 ...
smgr relation table: 24576 total in 2 blocks; 5696 free (4 ...
TransactionAbortContext: 32768 total in 1 blocks; 32736 fre ...
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); ...
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); ...
4 child contexts, 34 blocks, 87283 free
Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chu ...
CacheMemoryContext: 42236592 total in 28 blocks; 7160904 fr ...
1487438 child contexts, 14874382 blocks, 3498349 free

instead of the millions of lines with all the datails.

So either

log_memory_stats = (on|off)

or

log_memory_stats = (full|short|off)

might work.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-08-20 22:16:48 Re: Using quicksort for every external sort run
Previous Message David Fetter 2015-08-20 21:27:00 Re: Declarative partitioning