Re: remaining sql/json patches

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, jian he <jian(dot)universality(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: remaining sql/json patches
Date: 2024-03-05 21:30:20
Message-ID: 1e0e5800-965e-42ed-8caa-362294604243@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I know very little about sql/json and all the json internals, but I
decided to do some black box testing. I built a large JSONB table
(single column, ~7GB of data after loading). And then I did a query
transforming the data into tabular form using JSON_TABLE.

The JSON_TABLE query looks like this:

SELECT jt.* FROM
title_jsonb t,
json_table(t.info, '$'
COLUMNS (
"id" text path '$."id"',
"type" text path '$."type"',
"title" text path '$."title"',
"original_title" text path '$."original_title"',
"is_adult" text path '$."is_adult"',
"start_year" text path '$."start_year"',
"end_year" text path '$."end_year"',
"minutes" text path '$."minutes"',
"genres" text path '$."genres"',
"aliases" text path '$."aliases"',
"directors" text path '$."directors"',
"writers" text path '$."writers"',
"ratings" text path '$."ratings"',
NESTED PATH '$."aliases"[*]'
COLUMNS (
"alias_title" text path '$."title"',
"alias_region" text path '$."region"'
),
NESTED PATH '$."directors"[*]'
COLUMNS (
"director_name" text path '$."name"',
"director_birth_year" text path '$."birth_year"',
"director_death_year" text path '$."death_year"'
),
NESTED PATH '$."writers"[*]'
COLUMNS (
"writer_name" text path '$."name"',
"writer_birth_year" text path '$."birth_year"',
"writer_death_year" text path '$."death_year"'
),
NESTED PATH '$."ratings"[*]'
COLUMNS (
"rating_average" text path '$."average"',
"rating_votes" text path '$."votes"'
)
)
) as jt;

again, not particularly complex. But if I run this, it consumes multiple
gigabytes of memory, before it gets killed by OOM killer. This happens
even when ran using

COPY (...) TO '/dev/null'

so there's nothing sent to the client. I did catch memory context info,
where it looks like this (complete stats attached):

------
TopMemoryContext: 97696 total in 5 blocks; 13056 free (11 chunks);
84640 used
...
TopPortalContext: 8192 total in 1 blocks; 7680 free (0 chunks); ...
PortalContext: 1024 total in 1 blocks; 560 free (0 chunks); ...
ExecutorState: 2541764672 total in 314 blocks; 6528176 free
(1208 chunks); 2535236496 used
printtup: 8192 total in 1 blocks; 7952 free (0 chunks); ...
...
...
Grand total: 2544132336 bytes in 528 blocks; 7484504 free
(1340 chunks); 2536647832 used
------

I'd say 2.5GB in ExecutorState seems a bit excessive ... Seems there's
some memory management issue? My guess is we're not releasing memory
allocated while parsing the JSON or building JSON output.

I'm not attaching the data, but I can provide that if needed - it's
about 600MB compressed. The structure is not particularly complex, it's
movie info from [1] combined into a JSON document (one per movie).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
memory.txt text/plain 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-05 21:50:44 Re: [PATCH] Exponential backoff for auth_delay
Previous Message Thomas Munro 2024-03-05 21:22:09 Re: processes stuck in shutdown following OOM/recovery