OOM-killer issue with a specific query

From: nabble(dot)30(dot)miller_2555(at)spamgourmet(dot)com
To: pgsql-performance(at)postgresql(dot)org
Subject: OOM-killer issue with a specific query
Date: 2011-12-19 15:52:40
Message-ID: CABLpH8y=QiHwshW6ss66_62_ddxQFQ=f9MtKv8GqThi-h90xbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi - I'm running into an OOM-killer issue when running a specific query (no
virtual machine running) and, based on researching the issue, I can
probably fix by making the following sysctl adjustments:
vm.overcommit_memory = 2
vm.overcommit_ratio = 0
However, I am perplexed as to why I am running into the issue in the first
place. The machine (running Linux 2.6.34.7-61.fc13.x86_64) is dedicated to
Postgres (v9.0.0 [RPM package: postgresql90-9.0.0-1PGDG.fc13.1.x86_64]) and
the following memory usage is pretty typical for the system (via "top"):
Mem: 8121992k total, 2901960k used, 5220032k free, 237408k buffers
Swap: 1048572k total, 235940k used, 812632k free, 2053768k cached
Under steady-state conditions, the following shows the virtual memory size
for postgres backend processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8506 postgres 20 0 2327m 3084 1792 S 0.0 0.0 0:00.33 postgres
8504 postgres 20 0 2326m 14m 13m S 0.0 0.2 0:01.32 postgres
8505 postgres 20 0 2326m 728 452 S 0.0 0.0 0:00.91 postgres
3582 postgres 20 0 2325m 54m 53m S 0.0 0.7 0:02.03 postgres
My current relevant postgresql.conf settings are the following:
shared_buffers = 2100MB
temp_buffers = 8MB
work_mem = 32MB
maintenance_work_mem = 16MB
max_stack_depth = 2MB
constraint_exclusion = partition
When executing the query, I've been watching the "top" activity, sorted by
resident memory. Upon execution, no other processes appear to take
additional resident memory, except a postgres backend process servicing the
query, which goes to +6Gb (triggering the OOM-killer). Given the settings
in postgresql.conf, and my anecdotal understanding of Postgres memory
management functions, I am uncertain why Postgres exhausts physical memory
instead of swapping to temporary files. Do I need to lower my work_mem
setting since the subquery involves a partitioned table, causing a
multiplier effect to the memory used (I have tried per-connection settings
of 6MB)? Would tweaking query planning settings help?

Thanks in advance!

If it helps, I have included the query (with column names aliased to their
data type), a brief description of the applicable table's contents, and an
abridged copy of the EXPLAIN ANALYZE output

SELECT "bigint", "date", "text"
FROM tableA AS A
WHERE A."boolean" = 'true' AND
(A."text" = 'abc' OR A."text" = 'xyz') AND
A."bigint" NOT IN (SELECT "bigint" FROM tableB)
ORDER BY A."date" DESC;

tableA:
- total table contains ~11 million records (total width: 109 bytes)
- partitioned by month (180 partitions)
- each table partition contains ~100k records
tableB:
- total table contains ~400k records (total width: 279 bytes)
- partitioned by month (96 partitions)
- each table partition contains ~30k records

EXPLAIN ANALYZE output:
Note: could not produce output for exact query due to OOM-killer, but
ran query by limiting the subquery to the first 50 results. The planner
iterates over all partitions, but only the first two partitions are noted
for brevity.

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=451279.67..451279.70 rows=10 width=55) (actual
time=18343.085..18343.090 rows=10 loops=1)
-> Sort (cost=451279.67..456398.37 rows=2047480 width=55) (actual
time=18343.083..18343.087 rows=10 loops=1)
Sort Key: A."Date"
Sort Method: top-N heapsort Memory: 26kB
-> Result (cost=1.21..407034.37 rows=2047480 width=55) (actual
time=0.793..17014.726 rows=4160606 loops=1)
-> Append (cost=1.21..407034.37 rows=2047480 width=55)
(actual time=0.792..16119.298 rows=4160606 loops=1)
-> Seq Scan on tableA A (cost=1.21..19.08 rows=1
width=44) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ("boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Limit (cost=0.00..1.08 rows=50 width=8)
(actual time=0.010..0.054 rows=50 loops=210)
-> Result (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
-> Append (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0
loops=210)
-> Seq Scan on
tableB_201201 tableB (cost=0.00..15.30 rows=530 width=8) (actual
time=0.000..0.000 rows=0 loops=210)
-> Seq Scan on
tableB_201112 tableB (cost=0.00..251.25 rows=12125 width=8) (actual
time=0.006..0.019 rows=50 loops=210)
-> ...
-> Seq Scan on tableA_201201 A (cost=1.21..19.08
rows=1 width=44) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ("boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Limit (cost=0.00..1.08 rows=50 width=8)
(actual time=0.010..0.054 rows=50 loops=210)
-> Result (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
-> Append (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0
loops=210)
-> Seq Scan on
tableB_201201 tableB (cost=0.00..15.30 rows=530 width=8) (actual
time=0.000..0.000 rows=0 loops=210)
-> Seq Scan on
tableB_201112 tableB (cost=0.00..251.25 rows=12125 width=8) (actual
time=0.006..0.019 rows=50 loops=210)
-> ...
-> Seq Scan on tableA_201112 A (cost=1.21..794.69
rows=5980 width=55) (actual time=0.789..12.686 rows=12075 loops=1)
Filter: ("boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Limit (cost=0.00..1.08 rows=50 width=8)
(actual time=0.010..0.054 rows=50 loops=210)
-> Result (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
-> Append (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0
loops=210)
-> Seq Scan on
tableB_201201 tableB (cost=0.00..15.30 rows=530 width=8) (actual
time=0.000..0.000 rows=0 loops=210)
-> Seq Scan on
tableB_201112 tableB (cost=0.00..251.25 rows=12125 width=8) (actual
time=0.006..0.019 rows=50 loops=210)
-> Seq Scan on
tableB_201111 tableB (cost=0.00..604.89 rows=29189 width=8) (never
executed)
-> ...
-> Seq Scan on tableA_201111 A (cost=1.21..2666.12
rows=14670 width=55) (actual time=0.441..36.680 rows=29189 loops=1)
Filter: ("boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
SubPlan 1
-> Limit (cost=0.00..1.08 rows=50 width=8)
(actual time=0.010..0.054 rows=50 loops=210)
-> Result (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
-> Append (cost=0.00..9249.46
rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
-> Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0
loops=210)
-> Seq Scan on
tableB_201201 tableB (cost=0.00..15.30 rows=530 width=8) (actual
time=0.000..0.000 rows=0 loops=210)
-> Seq Scan on
tableB_201112 tableB (cost=0.00..251.25 rows=12125 width=8) (actual
time=0.006..0.019 rows=50 loops=210)
-> Seq Scan on
tableB_201111 tableB (cost=0.00..604.89 rows=29189 width=8) (never
executed)
-> ...
-> ...
Total runtime: 18359.851 ms
(23327 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message ktm@rice.edu 2011-12-19 15:54:11 Re: Dramatic change in memory usage with version 9.1
Previous Message Rafael Martinez 2011-12-19 15:04:54 Dramatic change in memory usage with version 9.1