Quick Links

Re: to_jsonb performance on array aggregated correlated subqueries

From:	Nico Heller <nico(dot)heller(at)posteo(dot)de>
To:	Rick Otten <rottenwindfish(at)gmail(dot)com>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: to_jsonb performance on array aggregated correlated subqueries
Date:	2022-08-12 19:18:33
Message-ID:	f79b14f6-f62a-34f6-4142-13295087e6c7@posteo.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Am 12.08.2022 um 21:15 schrieb Rick Otten:
>
>
> On Fri, Aug 12, 2022 at 3:07 PM Nico Heller <nico(dot)heller(at)posteo(dot)de> wrote:
>
> Am 12.08.2022 um 21:02 schrieb Rick Otten:
>
>>
>>
>> On Fri, Aug 12, 2022 at 2:50 PM Nico Heller
>> <nico(dot)heller(at)posteo(dot)de> wrote:
>>
>> Good day,
>>
>> consider the following query:
>>
>> WITH aggregation(
>>      SELECT
>>             a.*,
>>            (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id
>> <http://a.id>) as "bs",
>>            (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id
>> <http://a.id>) as "cs",
>>            (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id
>> <http://a.id>) as "ds",
>>            (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id
>> <http://a.id>) as "es"
>>      FROM a WHERE a.id <http://a.id> IN (<some big list,
>> ranging from 20-180 entries)
>> )
>> SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;
>>
>>
>> - You do have an index on `b.a_id` and `c.a_id`, etc... ? You
>> didn't say...
> Yes there are indices on all referenced columns of the subselect
> (they are all primary keys anyway)
>> - Are you sure it is the `to_jsonb` that is making this query slow?
> Yes, EXPLAIN ANALYZE shows a doubling of execution time - I don't
> have numbers on the memory usage difference though
>>
>> - Since you are serializing this for easy machine readable
>> consumption outside of the database, does it make a difference if
>> you use `to_json` instead?
>>
> Using to_json vs. to_jsonb makes no difference in regards to
> runtime, I will check if the memory consumption is different on
> monday - thank you for the idea!
>
>
> One other thought. Does it help if you convert the arrays to json
> first before you convert the whole row? ie, add some to_json()'s
> around the bs, cs, ds, es columns in the CTE. I'm wondering if
> breaking the json conversions up into smaller pieces will let the
> outer to_json() have less work to do and overall run faster. You
> could even separately serialize the elements inside the array too. I
> wouldn't think it would make a huge difference, you'd be making a
> bunch of extra to_json calls, but maybe it avoids some large memory
> structure that would otherwise have to be constructed to serialize all
> of those objects in all of the arrays all at the same time.

Using jsonb_array_agg and another to_jsonb at the (its still needed to
create one value at the end and to include the columns "a.*") worsens
the query performance by 100%, I can't speak for the memory usage
because I would have to push these changes to preproduction - will try
this on monday, thanks.

In response to

Re: to_jsonb performance on array aggregated correlated subqueries at 2022-08-12 19:07:20 from Nico Heller

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Nico Heller	2022-08-12 19:48:07	Re: to_jsonb performance on array aggregated correlated subqueries
Previous Message	Andres Freund	2022-08-12 19:15:35	Re: to_jsonb performance on array aggregated correlated subqueries