Re: memory problems and crash of db when deleting data from table with thousands of partitions

From: Josef Machytka <josef(dot)machytka(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: memory problems and crash of db when deleting data from table with thousands of partitions
Date: 2019-10-28 15:18:59
Message-ID: CAGvVEFsTp8cqALqPaOVxXaXMXx9V6TXtCAGobvWBb-ezjDbRtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you for your email, FYI - we now did tests with PostgreSQL 12 and
unfortunately it is also not able to handle to our case. Only difference is
that PG 12 is not killed by OOM killer and even does not crash - which is
good. But it reports error "out of memory" and stops the statement. So at
least it looks like much more stable then PG 11.

Regards

Josef Machytka

from postgresql log:

TopMemoryContext: 6685688 total in 213 blocks; 1229064 free (32 chunks);
5456624 used
pgstat TabStatusArray lookup hash table: 1048576 total in 8 blocks;
230568 free (17 chunks); 818008 used
TopTransactionContext: 8192 total in 1 blocks; 7744 free (2 chunks); 448
used
Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks);
21728 used
HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256
used
TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks);
13816 used
RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296
used
MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks);
52197253936 used
partition directory: 32768 total in 3 blocks; 8544 free (7 chunks);
24224 used
Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632
used
smgr relation table: 4194304 total in 10 blocks; 1597192 free (37
chunks); 2597112 used
TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks);
256 used
Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
TopPortalContext: 8192 total in 1 blocks; 7936 free (1 chunks); 256 used
Relcache by OID: 1048576 total in 8 blocks; 208888 free (16 chunks);
839688 used
CacheMemoryContext: 136577192 total in 29 blocks; 5037352 free (18
chunks); 131539840 used
index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used:
test_out_of_stock_default_v2__start_date_end_date_country_g_idx
index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used:
test_out_of_stock_zoot_defaul_start_date_end_date_country_g_idx
....
index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used:
test_out_of_stock_zalora_2018_start_date_end_date_country__idx1
19998 more child contexts containing 51415472 total in 39616 blocks;
12386152 free (10423 chunks); 39029320 used
WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks);
43400 used
PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
MdSmgr: 524288 total in 7 blocks; 42464 free (2 chunks); 481824 used
LOCALLOCK hash: 2097152 total in 9 blocks; 100416 free (30 chunks);
1996736 used
Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
ErrorContext: 8192 total in 1 blocks; 7936 free (3 chunks); 256 used
Grand total: 52401570368 bytes in 48395 blocks; 21121960 free (10700
chunks); 52380448408 used
2019-10-28 15:11:44.276 UTC [2746] upcload(at)queries ERROR: out of memory
2019-10-28 15:11:44.276 UTC [2746] upcload(at)queries DETAIL: Failed on
request of size 16 in memory context "MessageContext".
2019-10-28 15:11:44.276 UTC [2746] upcload(at)queries STATEMENT: ......

On Wed, 23 Oct 2019 at 14:47, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Wed, Oct 23, 2019 at 01:46:23PM +0200, Josef Machytka wrote:
> >Hi people,
> >
> >I know this is actually known problem (
> >
> https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850
> ).
> >
> >
> >I would just like to add my voice and description of use case to this
> >topic. If this could be repaired it would be amazing because we use new
> >native partitioning really a lot in our company and we like it - well, not
> >counting this problem into it....
> >
> >I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian
> >11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
> >6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having
> >more levels of partitioning. Main table is partitioned by shops and each
> >shops months (year-month) over several past years. Together several
> >thousands of partitions and number is still growing.
> >
> >When we simply insert new data (which is usual operation we do) all is
> >perfect. But lately we needed to delete some wrong data over all
> partitions
> >and PostgreSQL started to crash during this operation.
> >
> >Crashes are always the same - PostgreSQL starts to use more and more
> memory
> >and eventually is killed by OOM killer. I tried to fiddle with work_mem
> and
> >other settings - nothing, database just crashes a bit later but crashes
> >anyway.
> >
>
> Yeah, I think this is a known issue - there are cases where we're not
> smart enough and end up opening/locking all the partitions, resulting in
> excessive memory consumption (and OOM). I suppose this is one of those
> cases, but I'd have to see memory context stats to know for sure.
>
> Unfortunately, that's a design issue, and it's not going to be fixed in
> backbranches. We're improving this - perhaps PostgreSQL 12 would
> improve the behavior in your case, and hopefully 13 will do even better.
>
> >Of course workaround works - I can use script to do deletion or update
> over
> >each shop partition separately. There are only several dozens of monthly
> >partitions for each shop so it work perfectly. But anyway if problem would
> >be repaired and simple delete/update over top main table would be possible
> >this would be much better.
> >
>
> Right, that's a reasonable workaround.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Stephen Frost 2019-10-28 15:47:54 Re: BUG #16079: Question Regarding the BUG #16064
Previous Message william allen 2019-10-28 14:28:59 RE: BUG #15858: could not stat file - over 4GB