Quick Links

Re: to_jsonb performance on array aggregated correlated subqueries

From:	Nico Heller <nico(dot)heller(at)posteo(dot)de>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: to_jsonb performance on array aggregated correlated subqueries
Date:	2022-08-12 19:02:36
Message-ID:	9335c458-a1fa-0730-9c1f-0483c39652cb@posteo.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I knew I forgot something: We are currently on 13.6. When was this issue
fixed?

Am 12.08.2022 um 20:56 schrieb Justin Pryzby:
> What version of postgres ?
>
> I wonder if you're hitting the known memory leak involving jit.
> Try with jit=off or jit_inline_above_cost=-1.

> Good day,
>
>
>
> consider the following query:
>
>
>
> WITH aggregation(
>
>     SELECT
>
>            a.*,
>
>           (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
>
>           (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
>
>           (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
>
>           (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
>
>     FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
>
> )
>
> SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;
>
>
>
> Imagine that for each "a" there exists between 5-100 "b", "c", "d" and
> "e" which makes the result of this pretty big (worst case: around 300kb
> when saved to a text file).
>
> I noticed that adding the "to_jsonb" increases the query time by 100%,
> from 9-10ms to 17-23ms on average.
>
> This may not seem slow at all but this query has another issue: on an
> AWS Aurora Serverless V2 instance we are running into a RAM usage of
> around 30-50 GB compared to < 10 GB when using a simple LEFT JOINed
> query when under high load (> 1000 queries / sec). Furthermore the CPU
> usage is quite high.
>
>
>
> Is there anything I could improve? I am open for other solutions but I
> am wondering if I ran into an edge case of "to_jsonb" for "anonymous
> records" (these are just rows without a defined UDT) - this is just a
> wild guess though.
>
> I am mostly looking to decrease the load (CPU and memory) on Postgres
> itself. Furthermore I would like to know why the memory usage is so
> significant. Any tips on how to analyze this issue are appreciated as
> well - my knowledge is limited to being average at interpreting EXPLAIN
> ANALYZE results.
>
>
>
> Here's a succinct list of the why's, what I have found out so far and
> solution I already tried/ don't want to consider:
>
>
>
> - LEFT JOINing potentially creates a huge resultset because of the
> cartesian product, thats a nono
>
> - not using "to_jsonb" is sadly also not possible as Postgres' array +
> record syntax is very unfriendly and hard to parse (it's barely
> documented if at all and the quoting rules are cumbersome, furthermore I
> lack column names in the array which would make the parsing sensitive to
> future table changes and thus cumbersome to maintain) in my application
>
> - I know I could solve this with a separate query for a,b,c,d and e
> while "joinining" the result in my application, but I am looking for
> another way to do this (bear with me, treat this as an academic question :))
>
> - I am using "to_jsonb" to simply map the result to my data model via a
> json mapper
>
> - EXPLAIN ANALYZE is not showing anything special when using "to_jsonb"
> vs. not using it, the outermost (hash) join just takes more time - is
> there a more granular EXPLAIN that shows me the runtime of functions
> like "to_jsonb"?
>
> - I tried an approach where b,c,d,e where array columns of UDTs: UDTs
> are not well supported by my application stack (JDBC) and are generally
> undesireable for me (because of a lack of migration possibilities)
>
> - I don't want to duplicate my data into another table (e.g. that has
> jsonb columns)
>
> - MATERIALIZED VIEWS are also undesirable as the manual update, its
> update is non-incremental which would make a refresh on a big data set
> take a long time
>
> - split the query into chunks to reduce the IN()-statement list size
> makes no measurable difference
>
> - I don't want to use JSONB columns for b,c,d and e because future
> changes of b,c,d or e's structure (e.g. new fields, changing a datatype)
> are harder to achieve with JSONB and it lacks constraint checks on
> insert (e.g. not null on column b.xy)
>
>
>
> Kind regards and thank you for your time,
>
> Nico Heller
>
>
>
> P.S: Sorry for the long list of "I don't want to do this", some of them
> are not possible because of other requirements
>
>
>
>
>
>
>
>
>

In response to

Re: to_jsonb performance on array aggregated correlated subqueries at 2022-08-12 18:56:06 from Justin Pryzby

Responses

Re: to_jsonb performance on array aggregated correlated subqueries at 2022-08-12 19:10:53 from Justin Pryzby

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Nico Heller	2022-08-12 19:07:20	Re: to_jsonb performance on array aggregated correlated subqueries
Previous Message	Justin Pryzby	2022-08-12 18:56:06	Re: to_jsonb performance on array aggregated correlated subqueries