(full) Memory context dump considered harmful

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Postgresql Hackers Mailinglist <pgsql-hackers(at)postgresql(dot)org>
Subject: (full) Memory context dump considered harmful
Date: 2015-08-20 15:51:01
Message-ID: 55D5F765.30901@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all!

We just had a case of a very long running process of ours that creates
does a lot of prepared statements through Perls DBD:Pg running into:

https://rt.cpan.org/Public/Bug/Display.html?id=88827

This resulted in millions of prepared statements created, but not
removed in the affected backends over the course of 1-2 hours until the
backends in question ran out of memory.
The "out of memory" condition resulted in one memory context dump
generated per occurance each consisting of >2M lines(!) (basically a
line of CachedPlan/CachePlanSource per statement/function).
In the 20 Minutes or it took monitoring to alert and operations to react
this cause a followup incident because repeated "out of memory"
conditions caused over 400M(!!) loglines amounting to some 15GB of data
running the log partition dangerously close to full.

an example memory context dump looks like this:

TopMemoryContext: 582728880 total in 71126 blocks; 6168 free (52
chunks); 582722712 used
TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 chunks);
2096 used
ExecutorState: 8192 total in 1 blocks; 5392 free (0 chunks); 2800 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used
SPI Proc: 8192 total in 1 blocks; 5416 free (0 chunks); 2776 used
PL/pgSQL function context: 8192 total in 1 blocks; 1152 free (1
chunks); 7040 used
PL/pgSQL function context: 24576 total in 2 blocks; 11400 free (1
chunks); 13176 used
Type information cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 used
PL/pgSQL function context: 8192 total in 1 blocks; 1120 free (1
chunks); 7072 used
PL/pgSQL function context: 24576 total in 2 blocks; 10984 free (1
chunks); 13592 used
PL/pgSQL function context: 57344 total in 3 blocks; 29928 free (2
chunks); 27416 used
PL/pgSQL function context: 57344 total in 3 blocks; 28808 free (2
chunks); 28536 used
PL/pgSQL function context: 24576 total in 2 blocks; 5944 free (3
chunks); 18632 used
RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks);
8592 used
RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688
used
PL/pgSQL function context: 57344 total in 3 blocks; 31832 free (2
chunks); 25512 used
PL/pgSQL function context: 57344 total in 3 blocks; 29600 free (2
chunks); 27744 used
PL/pgSQL function context: 57344 total in 3 blocks; 39688 free (5
chunks); 17656 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0
chunks); 6512 used
PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0
chunks); 20776 used
Prepared Queries: 125886512 total in 25 blocks; 4764208 free (91
chunks); 121122304 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5
chunks); 12688 used
MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks);
18880 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
ExecutorState: 57344 total in 3 blocks; 21856 free (2 chunks);
35488 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks);
11744 used
CacheMemoryContext: 42236592 total in 28 blocks; 7160904 free (298
chunks); 35075688 used
CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks);
3264 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks);
3264 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks);
3264 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks);
3264 used
CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks);
5816 used
CachedPlan: 7168 total in 3 blocks; 2248 free (0 chunks); 4920 used
CachedPlanSource: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used
CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used
CachedPlanSource: 7168 total in 3 blocks; 1888 free (0 chunks);
5280 used
CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used
CachedPlanSource: 7168 total in 3 blocks; 1888 free (0 chunks);
5280 used
CachedPlan: 7168 total in 3 blocks; 2248 free (0 chunks); 4920 used
CachedPlanSource: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used
CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used

... goes of for some 2M lines...

given that an "out of memory" situation is already bad enough, but
essentially mounting a DoS on the rest of the system due to creating
excessive amounts of logs feels even worse.

I wonder if we should have a default of capping the dump to say 1k lines
or such and only optionally do a full one.
the other option would be to find an alternative way of displaying the
information like if there is a large number of repeated entries just
show the first X of them and say "123456 similiar ones".

This is 9.1.14 on Debian Wheezy/amd64 fwiw - but I dont think we have
made relevant changes in more recent versions.

regards

Stefan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2015-08-20 15:51:54 Re: Declarative partitioning
Previous Message Greg Stark 2015-08-20 15:42:21 Re: PostgreSQL for VAX on NetBSD/OpenBSD