Re: Out of Memory errors are frustrating as heck!

From: Gunther <raj(at)gusw(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-08-23 14:19:51
Message-ID: bc6187e8-fdc2-a567-56ac-3a249dc2d8d6@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

OK, I went back through that old thread, and I noticed an early opinion
by a certain Peter <pmc at citylink> who said that I should provision
some swap space. Since I had plenty of disk and no other option I tried
that. And it did some magic. Here this is a steady state now:

top - 14:07:32 up 103 days, 9:57, 5 users, load average: 1.33, 1.05, 0.54
Tasks: 329 total, 2 running, 117 sleeping, 0 stopped, 0 zombie
%Cpu(s): 31.0 us, 11.4 sy, 0.0 ni, 35.3 id, 22.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7910376 total, 120524 free, 2174940 used, 5614912 buff/cache
KiB Swap: 16777212 total, 16777212 free, 0 used. 3239724 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5068 postgres 20 0 4352496 4.0g 2.0g R 76.4 52.6 3:01.39 postgres: postgres integrator [local] INSERT
435 root 20 0 0 0 0 S 4.0 0.0 10:52.38 [kswapd0]

and the nice thing is, the backend server process appears to be bounded
at 4GB, so there isn't really a "memory leak". And also, the swap space
isn't really being used. This may have to do with these vm. sysctl
settings, overcommit, etc.

* vm.overcommit_memory = 2 -- values are
o 0 -- estimate free memory
o 1 -- always assume there is enough memory
o 2 -- no over-commit allocate only inside the following two
parameters
* vm.overcommit_kbytes = 0 -- how many kB above swap can be
over-committed, EITHER this OR
* vm.overcommit_ratio = 50 -- percent of main memory that can be
committed over swap,
o with 0 swap, that percent can be committed
o i.e., this of 8 GB, 4 GB are reserved for buffer cache
o not a good idea probably
o at least we should allow 75% committed, i.e., 6 GB of 8 GB, leaving
+ 2 GB of buffer cache
+ 2 GB of shared buffers
+ 4 GB of all other memory

I have vm.overcommit_memory = 2, _kbytes = 0, _ratio = 50. So this means
with _ratio = 50 I can commit 50% of memory, 4GB and this is exactly
what the server process wants. So with little impact on the available
buffer cache I am in a fairly good position now. The swap (that in my
case I set at 2 x main memory = 16G) serves as a buffer to smooth out
this peak usage without ever actually paging.

I suppose even without swap I could have set vm.overcommit_ratio = 75,
and I notice now that I already commented this much (the above bullet
points are my own notes.)

Anyway, for now, I am good. Thank you very much.

regards,
-Gunther

On 8/23/2019 9:17, Gunther wrote:
>
> Hi all, I am connecting to a discussion back from April this year. My
> data has grown and now I am running into new out of memory situations.
> Meanwhile the world turned from 11.2 to 11.5 which I just installed
> only to find the same out of memory error.
>
> Have any of the things discussed and proposed, especially this last
> one by Tomas Vondra, been applied to the 11 releases? Should I try
> these older patches from April?
>
> regards,
> -Gunther
>
> For what it is worth, this is what I am getting:
>
> TopMemoryContext: 67424 total in 5 blocks; 7184 free (7 chunks); 60240
> used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks;
> 416 free (0 chunks); 7776 used TopTransactionContext: 8192 total in 1
> blocks; 7720 free (1 chunks); 472 used Operator lookup cache: 24576
> total in 2 blocks; 10760 free (3 chunks); 13816 used TableSpace cache:
> 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type
> information cache: 24352 total in 2 blocks; 2624 free (0 chunks);
> 21728 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0
> chunks); 1296 used MessageContext: 8388608 total in 11 blocks; 3094872
> free (4 chunks); 5293736 used JoinRelHashTable: 16384 total in 2
> blocks; 5576 free (1 chunks); 10808 used Operator class cache: 8192
> total in 1 blocks; 560 free (0 chunks); 7632 used smgr relation table:
> 32768 total in 3 blocks; 12720 free (8 chunks); 20048 used
> TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0
> chunks); 256 used Portal hash: 8192 total in 1 blocks; 560 free (0
> chunks); 7632 used TopPortalContext: 8192 total in 1 blocks; 7664 free
> (0 chunks); 528 used PortalContext: 1024 total in 1 blocks; 624 free
> (0 chunks); 400 used: ExecutorState: 202528536 total in 19 blocks;
> 433464 free (12 chunks); 202095072 used HashTableContext: 8192 total
> in 1 blocks; 7656 free (0 chunks); 536 used HashBatchContext: 10615104
> total in 261 blocks; 7936 free (0 chunks); 10607168 used
> HashTableContext: 8192 total in 1 blocks; 7688 free (1 chunks); 504
> used HashBatchContext: 13079304 total in 336 blocks; 7936 free (0
> chunks); 13071368 used TupleSort main: 49208 total in 3 blocks; 8552
> free (7 chunks); 40656 used Caller tuples: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used Subplan HashTable Temp Context: 1024
> total in 1 blocks; 768 free (0 chunks); 256 used Subplan HashTable
> Context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> Subplan HashTable Temp Context: 1024 total in 1 blocks; 768 free (0
> chunks); 256 used Subplan HashTable Context: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used Subplan HashTable Temp Context: 1024
> total in 1 blocks; 768 free (0 chunks); 256 used Subplan HashTable
> Context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> Subplan HashTable Temp Context: 1024 total in 1 blocks; 768 free (0
> chunks); 256 used Subplan HashTable Context: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks;
> 7936 free (0 chunks); 256 used Subplan HashTable Temp Context: 1024
> total in 1 blocks; 768 free (0 chunks); 256 used Subplan HashTable
> Context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7360 free (0 chunks); 832 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
> ExprContext: 1107296256 total in 142 blocks; 6328 free (101 chunks);
> 1107289928 used ExprContext: 8192 total in 1 blocks; 7936 free (0
> chunks); 256 used 1 more child contexts containing 8192 total in 1
> blocks; 7936 free (0 chunks); 256 used Relcache by OID: 16384 total in
> 2 blocks; 2472 free (2 chunks); 13912 used CacheMemoryContext: 1113488
> total in 14 blocks; 16776 free (0 chunks); 1096712 used index info:
> 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
> docsubjh_sjrcode_ndx index info: 1024 total in 1 blocks; 48 free (0
> chunks); 976 used: docsubjh_sjrclass_ndx index info: 1024 total in 1
> blocks; 48 free (0 chunks); 976 used: docsubjh_scopeiid_ndx index
> info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
> docsubjh_dociid_ndx index info: 4096 total in 3 blocks; 2064 free (2
> chunks); 2032 used: role_telecom_idx index info: 2048 total in 2
> blocks; 968 free (1 chunks); 1080 used: role_addr_fkidx index info:
> 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: role_id_fkidx
> index info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used:
> role_id_idx index info: 2048 total in 2 blocks; 968 free (1 chunks);
> 1080 used: role_name_fkidx index info: 4096 total in 3 blocks; 2064
> free (2 chunks); 2032 used: entity_telecom_idx index info: 2048 total
> in 2 blocks; 968 free (1 chunks); 1080 used: entity_id_fkidx index
> info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used:
> entity_id_idx index info: 2048 total in 2 blocks; 624 free (1 chunks);
> 1424 used: entity_det_code_idx index info: 4096 total in 3 blocks;
> 2016 free (2 chunks); 2080 used: entity_code_nodash_idx index info:
> 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: entity_pkey
> index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used:
> connect_rule_pkey index info: 2048 total in 2 blocks; 952 free (1
> chunks); 1096 used: role_context_idx index info: 2048 total in 2
> blocks; 640 free (2 chunks); 1408 used: role_partitions index info:
> 2048 total in 2 blocks; 640 free (2 chunks); 1408 used:
> role_scoper_idx index info: 2048 total in 2 blocks; 640 free (2
> chunks); 1408 used: role_player_idx index info: 2048 total in 2
> blocks; 968 free (1 chunks); 1080 used: role__pkey index info: 2048
> total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2619_index
> index info: 2048 total in 2 blocks; 592 free (1 chunks); 1456 used:
> pg_constraint_conrelid_contypid_conname_index index info: 2048 total
> in 2 blocks; 624 free (1 chunks); 1424 used: participation_act_idx
> index info: 2048 total in 2 blocks; 624 free (1 chunks); 1424 used:
> participation_role_idx index info: 2048 total in 2 blocks; 952 free (1
> chunks); 1096 used: participation_pkey index info: 1024 total in 1
> blocks; 48 free (0 chunks); 976 used: pg_statistic_ext_relid_index
> index info: 2048 total in 2 blocks; 624 free (1 chunks); 1424 used:
> doc_ndx_internaiddoctype index info: 2048 total in 2 blocks; 680 free
> (1 chunks); 1368 used: pg_toast_2618_index index info: 2048 total in 2
> blocks; 952 free (1 chunks); 1096 used: pg_index_indrelid_index
> relation rules: 827392 total in 104 blocks; 2400 free (1 chunks);
> 824992 used: v_documentsubjecthistory index info: 2048 total in 2
> blocks; 648 free (2 chunks); 1400 used:
> pg_db_role_setting_databaseid_rol_index index info: 2048 total in 2
> blocks; 624 free (2 chunks); 1424 used: pg_opclass_am_name_nsp_index
> index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used:
> pg_foreign_data_wrapper_name_index index info: 1024 total in 1 blocks;
> 48 free (0 chunks); 976 used: pg_enum_oid_index index info: 2048 total
> in 2 blocks; 680 free (2 chunks); 1368 used:
> pg_class_relname_nsp_index index info: 1024 total in 1 blocks; 48 free
> (0 chunks); 976 used: pg_foreign_server_oid_index index info: 1024
> total in 1 blocks; 48 free (0 chunks); 976 used:
> pg_publication_pubname_index index info: 2048 total in 2 blocks; 592
> free (3 chunks); 1456 used: pg_statistic_relid_att_inh_index index
> info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used:
> pg_cast_source_target_index index info: 1024 total in 1 blocks; 48
> free (0 chunks); 976 used: pg_language_name_index index info: 1024
> total in 1 blocks; 48 free (0 chunks); 976 used:
> pg_transform_oid_index index info: 1024 total in 1 blocks; 48 free (0
> chunks); 976 used: pg_collation_oid_index index info: 3072 total in 2
> blocks; 1136 free (2 chunks); 1936 used: pg_amop_fam_strat_index index
> info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
> pg_index_indexrelid_index index info: 2048 total in 2 blocks; 760 free
> (2 chunks); 1288 used: pg_ts_template_tmplname_index index info: 2048
> total in 2 blocks; 704 free (3 chunks); 1344 used:
> pg_ts_config_map_index index info: 2048 total in 2 blocks; 952 free (1
> chunks); 1096 used: pg_opclass_oid_index index info: 1024 total in 1
> blocks; 16 free (0 chunks); 1008 used:
> pg_foreign_data_wrapper_oid_index index info: 1024 total in 1 blocks;
> 48 free (0 chunks); 976 used: pg_event_trigger_evtname_index index
> info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_statistic_ext_name_index index info: 1024 total in 1 blocks; 48
> free (0 chunks); 976 used: pg_publication_oid_index index info: 1024
> total in 1 blocks; 48 free (0 chunks); 976 used: pg_ts_dict_oid_index
> index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
> pg_event_trigger_oid_index index info: 3072 total in 2 blocks; 1216
> free (3 chunks); 1856 used: pg_conversion_default_index index info:
> 3072 total in 2 blocks; 1136 free (2 chunks); 1936 used:
> pg_operator_oprname_l_r_n_index index info: 2048 total in 2 blocks;
> 680 free (2 chunks); 1368 used: pg_trigger_tgrelid_tgname_index index
> info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_enum_typid_label_index index info: 1024 total in 1 blocks; 48 free
> (0 chunks); 976 used: pg_ts_config_oid_index index info: 1024 total in
> 1 blocks; 48 free (0 chunks); 976 used: pg_user_mapping_oid_index
> index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
> pg_opfamily_am_name_nsp_index index info: 1024 total in 1 blocks; 48
> free (0 chunks); 976 used: pg_foreign_table_relid_index index info:
> 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
> pg_type_oid_index index info: 2048 total in 2 blocks; 952 free (1
> chunks); 1096 used: pg_aggregate_fnoid_index index info: 1024 total in
> 1 blocks; 48 free (0 chunks); 976 used: pg_constraint_oid_index index
> info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used:
> pg_rewrite_rel_rulename_index index info: 2048 total in 2 blocks; 760
> free (2 chunks); 1288 used: pg_ts_parser_prsname_index index info:
> 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_ts_config_cfgname_index index info: 1024 total in 1 blocks; 48 free
> (0 chunks); 976 used: pg_ts_parser_oid_index index info: 2048 total in
> 2 blocks; 728 free (1 chunks); 1320 used:
> pg_publication_rel_prrelid_prpubid_index index info: 2048 total in 2
> blocks; 952 free (1 chunks); 1096 used: pg_operator_oid_index index
> info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
> pg_namespace_nspname_index index info: 1024 total in 1 blocks; 48 free
> (0 chunks); 976 used: pg_ts_template_oid_index index info: 2048 total
> in 2 blocks; 624 free (2 chunks); 1424 used: pg_amop_opr_fam_index
> index info: 2048 total in 2 blocks; 672 free (3 chunks); 1376 used:
> pg_default_acl_role_nsp_obj_index index info: 2048 total in 2 blocks;
> 704 free (3 chunks); 1344 used: pg_collation_name_enc_nsp_index index
> info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
> pg_publication_rel_oid_index index info: 1024 total in 1 blocks; 48
> free (0 chunks); 976 used: pg_range_rngtypid_index index info: 2048
> total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_ts_dict_dictname_index index info: 2048 total in 2 blocks; 680 free
> (2 chunks); 1368 used: pg_type_typname_nsp_index index info: 1024
> total in 1 blocks; 48 free (0 chunks); 976 used: pg_opfamily_oid_index
> index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
> pg_statistic_ext_oid_index index info: 2048 total in 2 blocks; 952
> free (1 chunks); 1096 used: pg_class_oid_index index info: 2048 total
> in 2 blocks; 624 free (2 chunks); 1424 used:
> pg_proc_proname_args_nsp_index index info: 1024 total in 1 blocks; 16
> free (0 chunks); 1008 used: pg_partitioned_table_partrelid_index index
> info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_transform_type_lang_index index info: 2048 total in 2 blocks; 680
> free (2 chunks); 1368 used: pg_attribute_relid_attnum_index index
> info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
> pg_proc_oid_index index info: 1024 total in 1 blocks; 48 free (0
> chunks); 976 used: pg_language_oid_index index info: 1024 total in 1
> blocks; 48 free (0 chunks); 976 used: pg_namespace_oid_index index
> info: 3072 total in 2 blocks; 1136 free (2 chunks); 1936 used:
> pg_amproc_fam_proc_index index info: 1024 total in 1 blocks; 48 free
> (0 chunks); 976 used: pg_foreign_server_name_index index info: 2048
> total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_attribute_relid_attnam_index index info: 1024 total in 1 blocks; 48
> free (0 chunks); 976 used: pg_conversion_oid_index index info: 2048
> total in 2 blocks; 728 free (1 chunks); 1320 used:
> pg_user_mapping_user_server_index index info: 2048 total in 2 blocks;
> 728 free (1 chunks); 1320 used:
> pg_subscription_rel_srrelid_srsubid_index index info: 1024 total in 1
> blocks; 48 free (0 chunks); 976 used: pg_sequence_seqrelid_index index
> info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
> pg_conversion_name_nsp_index index info: 2048 total in 2 blocks; 952
> free (1 chunks); 1096 used: pg_authid_oid_index index info: 2048 total
> in 2 blocks; 728 free (1 chunks); 1320 used:
> pg_auth_members_member_role_index 10 more child contexts containing
> 17408 total in 17 blocks; 6080 free (10 chunks); 11328 used WAL record
> construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400
> used PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks);
> 5568 used MdSmgr: 8192 total in 1 blocks; 6408 free (0 chunks); 1784
> used LOCALLOCK hash: 16384 total in 2 blocks; 4600 free (2 chunks);
> 11784 used Timezones: 104120 total in 2 blocks; 2624 free (0 chunks);
> 101496 used ErrorContext: 8192 total in 1 blocks; 7936 free (4
> chunks); 256 used Grand total: 1345345736 bytes in 1209 blocks;
> 4529600 free (270 chunks); 1340816136 used
> On 4/28/2019 10:19, Tomas Vondra wrote:
>> On Wed, Apr 24, 2019 at 02:36:33AM +0200, Tomas Vondra wrote:
>>>
>>> ...
>>>
>>> I still think the idea with an "overflow batch" is worth considering,
>>> because it'd allow us to keep the memory usage within work_mem. And
>>> after getting familiar with the hash join code again (haven't messed
>>> with it since 9.5 or so) I think it should not be all that difficult.
>>> I'll give it a try over the weekend if I get bored for a while.
>>>
>>
>> OK, so I took a stab at this, and overall it seems to be workable. The
>> patches I have are nowhere near committable, but I think the approach
>> works fairly well - the memory is kept in check, and the performance is
>> comparable to the "ballancing" approach tested before.
>>
>> To explain it a bit, the idea is that we can compute how many BufFile
>> structures we can keep in memory - we can't use more than work_mem/2 for
>> that, because then we'd mostly eliminate space for the actual data. For
>> example with 4MB, we know we can keep 128 batches - we need 128 for
>> outer and inner side, so 256 in total, and 256*8kB = 2MB.
>>
>> And then, we just increase the number of batches but instead of adding
>> the BufFile entries, we split batches into slices that we can keep in
>> memory (say, the 128 batches). And we keep BufFiles for the current one
>> and an "overflow file" for the other slices. After processing a slice,
>> we simply switch to the next one, and use the overflow file as a temp
>> file for the first batch - we redistribute it into the other batches in
>> the slice and another overflow file.
>>
>> That's what the v3 patch (named 'single overflow file') does. I does
>> work, but unfortunately it significantly inflates the amount of data
>> written to temporary files. Assume we need e.g. 1024 batches, but only
>> 128 fit into memory. That means we'll need 8 slices, and during the
>> first pass we'll handle 1/8 of the data and write 7/8 to the overflow
>> file.  Then after processing the slice and switching to the next one, we
>> repeat this dance - 1/8 gets processed, 6/8 written to another overflow
>> file. So essentially we "forward" about
>>
>>    7/8 + 6/8 + 5/8 + ... + 1/8 = 28/8 = 3.5
>>
>> of data between slices, and we need to re-shuffle data in each slice,
>> which amounts to additional 1x data. That's pretty significant overhead,
>> as will be clear from the measurements I'll present shortly.
>>
>> But luckily, there's a simple solution to this - instead of writing the
>> data into a single overflow file, we can create one overflow file for
>> each slice. That will leave us with the ~1x of additional writes when
>> distributing data into batches in the current slice, but it eliminates
>> the main source of write amplification - awalanche-like forwarding of
>> data between slices.
>>
>> This relaxes the memory limit a bit again, because we can't really keep
>> the number of overflow files constrained by work_mem, but we should only
>> need few of them (much less than when adding one file per batch right
>> away). For example with 128 in-memory batches, this reduces the amount
>> of necessary memory 128x.
>>
>> And this is what v4 (per-slice overflow file) does, pretty much.
>>
>>
>> Two more comments, regarding memory accounting in previous patches. It
>> was a bit broken, because we actually need 2x the number of BufFiles. We
>> needed nbatch files for outer side and nbatch files for inner side, but
>> we only considered one of those - both when deciding when to increase
>> the number of batches / increase spaceAllowed, and when reporting the
>> memory usage. So with large number of batches the reported amount of
>> used memory was roughly 1/2 of the actual value :-/
>>
>> The memory accounting was a bit bogus for another reason - spaceUsed
>> simply tracks the amount of memory for hash table contents. But at the
>> end we were simply adding the current space for BufFile stuff, ignoring
>> the fact that that's likely much larger than when the spacePeak value
>> got stored. For example we might have kept early spaceUsed when it was
>> almost work_mem, and then added the final large BufFile allocation.
>>
>> I've fixed both issues in the patches attached to this message. It does
>> not make a huge difference in practice, but it makes it easier to
>> compare values between patches.
>>
>>
>> Now, some test results - I've repeated the simple test with uniform data
>> set, which is pretty much ideal for hash joins (no unexlectedly large
>> batches that can't be split, etc.). I've done this with 1M, 5M, 10M, 25M
>> and 50M rows in the large table (which gets picked for the "hash" side),
>> and measured how much memory gets used, how many batches, how long it
>> takes and how much data gets written to temp files.
>>
>> See the hashjoin-test.sh script for more details.
>>
>> So, here are the results with work_mem = 4MB (so the number of in-memory
>> batches for the last two entries is 128). The columns are:
>>
>> * nbatch - the final number of batches
>> * memory - memory usage, as reported by explain analyze
>> * time - duration of the query (without explain analyze) in seconds
>> * size - size of the large table
>> * temp - amount of data written to temp files
>> * amplif - write amplification (temp / size)
>>
>>
>>  1M rows
>>  ===================================================================
>>                  nbatch  memory   time  size (MB)  temp (MB) amplif
>>  -------------------------------------------------------------------
>>  master             256    7681    3.3        730        899 1.23
>>  rebalance          256    7711    3.3        730        884 1.21
>>  single file       1024    4161    7.2        730       3168 4.34
>>  per-slice file    1024    4161    4.7        730       1653 2.26
>>
>>
>>  5M rows
>>  ===================================================================
>>                  nbatch  memory   time  size (MB)  temp (MB) amplif
>>  -------------------------------------------------------------------
>>  master            2048   36353     22       3652       5276 1.44
>>  rebalance          512   16515     18       3652       4169 1.14
>>  single file       4096    4353    156       3652      53897 14.76
>>  per-slice file    4096    4353     28       3652       8106 2.21
>>
>>
>>  10M rows
>>  ===================================================================
>>                  nbatch  memory   time  size (MB)  temp (MB) amplif
>>  -------------------------------------------------------------------
>>  master            4096   69121     61       7303      10556 1.45
>>  rebalance          512   24326     46       7303       7405 1.01
>>  single file       8192    4636    762       7303     211234 28.92
>>  per-slice file    8192    4636     65       7303      16278 2.23
>>
>>
>>  25M rows
>>  ===================================================================
>>                  nbatch  memory   time  size (MB)  temp (MB) amplif
>>  -------------------------------------------------------------------
>>  master            8192  134657    190       7303      24279 1.33
>>  rebalance         1024   36611    158       7303      20024 1.10
>>  single file      16384    6011   4054       7303    1046174 57.32
>>  per-slice file   16384    6011    207       7303      39073 2.14
>>
>>
>>  50M rows
>>  ===================================================================
>>                  nbatch  memory   time  size (MB)  temp (MB) amplif
>>  -------------------------------------------------------------------
>>  master           16384  265729    531      36500      48519 1.33
>>  rebalance         2048   53241    447      36500      48077 1.32
>>  single file          -       -      -      36500 -      -
>>  per-slice file   32768    8125    451      36500      78662 2.16
>>
>>
>> From those numbers it's pretty clear that per-slice overflow file does
>> by far the best job in enforcing work_mem and minimizing the amount of
>> data spilled to temp files. It does write a bit more data than both
>> master and the simple rebalancing, but that's the cost for enforcing
>> work_mem more strictly. It's generally a bit slower than those two
>> approaches, although on the largest scale it's actually a bit faster
>> than master. I think that's pretty acceptable, considering this is meant
>> to address extreme underestimates where we currently just eat memory.
>>
>> The case with single overflow file performs rather poorly - I haven't
>> even collected data from the largest scale, but considering it spilled
>> 1TB of temp files with a dataset half the size, that's not an issue.
>> (Note that this does not mean it needs 1TB of temp space, those writes
>> are spread over time and the files are created/closed as we go. The
>> system only has ~100GB of free disk space.)
>>
>>
>> Gunther, could you try the v2 and v4 patches on your data set? That
>> would be an interesting data point, I think.
>>
>>
>> regards
>>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2019-08-23 14:20:31 Re: Out of Memory errors are frustrating as heck!
Previous Message Gunther 2019-08-23 13:17:38 Re: Out of Memory errors are frustrating as heck!