[report] memory leaks in COPY FROM on partitioned table

From: Kohei KaiGai <kaigai(at)heterodb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [report] memory leaks in COPY FROM on partitioned table
Date: 2018-07-24 00:41:52
Message-ID: CAOP8fzYtVFWZADq4c=KoTAqgDrHWfng+AnEPEZccyxqxPVbbWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

In PG11beta2, my backend process gets terminated during COPY FROM of
large text file (544GB) on partitioned table.
The kernel log says OOM Killer send SIGKILL due to memory pressure.
In fact, 63GB of system physical 64GB was consumed by the PostgreSQL
backend just before the termination.

OOM Killer says:
[69267.687791] Out of memory: Kill process 23881 (postgres) score 977
or sacrifice child
[69267.687860] Killed process 23881 (postgres) total-vm:105210868kB,
anon-rss:63372320kB, file-rss:0kB, shmem-rss:126144kB

Configurations are below:
The 'lineorder' partition table has three child tables by
hash-partitioning on lo_orderkey (numeric).
Each child table has its own tablespace. 'lineorder_p0' is built on
the tablespace 'nvme0' on behalf of independent SSD device for
instance.
The query I run is:
copy lineorder from '/opt/nvme0/ssbm/lineorder.tbl' delimiter '|';

So, its contents shall be distributed to individual child tables,
based on in-database evaluation of hash-keys.

To track which memory-context consumes too much memory more than usual
expectations, I put elog() to report dying message prior to OOM
Killer.
See "aset-track.patch". It raises a warning message when memory
consumption per memory-context goes across watermark.

It says 'PortalContext' consumed 25GB at 04:26, then it grows up to
34GB at 05:21, and terminated at 05:51.
It looks to me somewhere allocates memory our of per-tuple memory
context, but I'm still under the investigation.

Any ideas?
------------
2018-07-25 04:26:54.096 JST [23881] WARNING: memory context
'PortalContext' grows up 25769803784 bytes
2018-07-25 04:26:54.096 JST [23881] CONTEXT: COPY lineorder, line
1610626836: "1610614887|1|18487099|541334|1474684|19980523|3-MEDIUM|0|30|4125930|30528526|2|4043411|82518|8|19980..."
WARNING: memory context 'PortalContext' grows up 25769803784 bytes
2018-07-25 04:27:07.202 JST [23865] LOG: checkpoints are occurring
too frequently (25 seconds apart)
:
<snip>
:
2018-07-25 05:21:22.423 JST [23881] WARNING: memory context
'PortalContext' grows up 34359738384 bytes
2018-07-25 05:21:22.423 JST [23881] CONTEXT: COPY lineorder, line
2147497762: "2147498439|7|910553|962168|773580|19971006|1-URGENT|0|46|5658552|38894795|1|5601966|73807|2|19971201..."
:
<snip>
:
2018-07-25 05:51:07.264 JST [23837] LOG: server process (PID 23881)
was terminated by signal 9: Killed
2018-07-25 05:51:07.264 JST [23837] DETAIL: Failed process was
running: copy lineorder from '/opt/nvme0/ssbm/lineorder.tbl' delimiter
'|';
------------

--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai(at)heterodb(dot)com>

Attachment Content-Type Size
aset-track.patch application/octet-stream 2.1 KB
OOMKiller.log application/octet-stream 10.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-07-24 01:11:43 Re: setproctitle_fast()
Previous Message Tom Lane 2018-07-23 23:04:09 Re: "interesting" issue with restore from a pg_dump with a database-wide search_path