BUG #17844: Memory consumption for memoize node

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: alexey(dot)ermakov(at)dataegret(dot)com
Subject: BUG #17844: Memory consumption for memoize node
Date: 2023-03-15 09:08:18
Message-ID: 17844-d2f6f9e75a622bed@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17844
Logged by: Alexey Ermakov
Email address: alexey(dot)ermakov(at)dataegret(dot)com
PostgreSQL version: 14.7
Operating system: Ubuntu 20.04
Description:

Hello, recently I found interesting query which consume a lot of memory
(much more than expected with current work_mem and hash_mem_multiplier
settings) and triggers OOM killer.

After simplifying query looks like there is a problem in part where we join
table on condition like:
id = (((test_json.big_json_column)::json ->> 'id'::text))::integer
and there is a memoize node in the plan. Without memoize node query works
fine.

I was able to create reproducible test case (on machine with default config
and postgresql 14.7):

------------------------------------------------------------------------------
create extension pgcrypto;
--create function to generate dummy data
CREATE OR REPLACE FUNCTION generate_random_string(size INT) RETURNS TEXT AS
$$
DECLARE
characters TEXT :=
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
bytes BYTEA := gen_random_bytes(size);
l INT := length(characters);
i INT := 0;
output TEXT := '';
BEGIN
WHILE i < size LOOP
output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
i := i + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

--create table with 200k rows and text column with content like
`{"random_string":"....", "id":1}`
create table test_json as
select id, ('{"random_string":"' || generate_random_string(500) || '",
"id":' || id || '}')::text as big_json_column
from generate_series(1,200000) gs(id);

--duplicate rows and update statistics (to trigger memoize node)
insert into test_json select * from test_json;
analyze test_json;

--another table with similar structure and 100k rows
create table test_json2 as
select id, ('{"random_string":"' || generate_random_string(500) || '",
"id":' || id || '}')::json as big_json_column
from generate_series(1,100000) gs(id);
alter table test_json2 add primary key(id);

--table sizes: test_json - 223MB, test_json2 - 56MB

--change settings to trigger plan with memoize node
set work_mem = '128MB';
set enable_hashjoin = off;
set enable_mergejoin = off;
set jit = off;

explain (analyze, buffers) select test_json.id, test_json2.id from test_json
left join test_json2 on test_json2.id =
((test_json.big_json_column::json)->>'id')::int;
Nested Loop Left Join (cost=0.32..108047.22 rows=400000 width=8) (actual
time=0.031..3155.261 rows=400000 loops=1)
Buffers: shared hit=415739 read=12834
I/O Timings: read=18.254
-> Seq Scan on test_json (cost=0.00..32572.00 rows=400000 width=540)
(actual time=0.005..57.248 rows=400000 loops=1)
Buffers: shared hit=15738 read=12834
I/O Timings: read=18.254
-> Memoize (cost=0.32..0.34 rows=1 width=4) (actual time=0.008..0.008
rows=0 loops=400000)
Cache Key: (((test_json.big_json_column)::json ->>
'id'::text))::integer
Cache Mode: logical
Hits: 200000 Misses: 200000 Evictions: 0 Overflows: 0 Memory
Usage: 16797kB
Buffers: shared hit=400001
-> Index Only Scan using test_json2_pkey on test_json2
(cost=0.30..0.33 rows=1 width=4) (actual time=0.001..0.001 rows=0
loops=200000)
Index Cond: (id = (((test_json.big_json_column)::json ->>
'id'::text))::integer)
Heap Fetches: 0
Buffers: shared hit=400001
Planning Time: 0.069 ms
Execution Time: 3227.078 ms
------------------------------------------------------------------------------

During execution I looked on "avail Mem" in top output on test machine to
check how much memory process consume. It looked different each time,
usually hundreds of MB, sometime around 1.5GB (which is even bigger than
table size).
I was able to trigger OOM killer with this query and bigger test_json table
with similar data.

I'm wondering:
1) Is it a known bug ? Does it relate to json parsing somehow ?
2) Is it possible to show such memory consumption in explain (analyze,
buffers) output for easier troubleshooting ?

--
Thanks,
Alexey Ermakov

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message hubert depesz lubaczewski 2023-03-15 10:46:17 Re: pg_read_server_files doesn't let me use pg_ls_dir() or pg_read_file?
Previous Message Kyotaro Horiguchi 2023-03-15 02:10:11 Re: pg_read_server_files doesn't let me use pg_ls_dir() or pg_read_file?