From: | Nico Heller <nico(dot)heller(at)posteo(dot)de> |
---|---|
To: | Rick Otten <rottenwindfish(at)gmail(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: to_jsonb performance on array aggregated correlated subqueries |
Date: | 2022-08-12 19:18:33 |
Message-ID: | f79b14f6-f62a-34f6-4142-13295087e6c7@posteo.de |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Am 12.08.2022 um 21:15 schrieb Rick Otten:
>
>
> On Fri, Aug 12, 2022 at 3:07 PM Nico Heller <nico(dot)heller(at)posteo(dot)de> wrote:
>
> Am 12.08.2022 um 21:02 schrieb Rick Otten:
>
>>
>>
>> On Fri, Aug 12, 2022 at 2:50 PM Nico Heller
>> <nico(dot)heller(at)posteo(dot)de> wrote:
>>
>> Good day,
>>
>> consider the following query:
>>
>> WITH aggregation(
>> SELECT
>> a.*,
>> (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id
>> <http://a.id>) as "bs",
>> (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id
>> <http://a.id>) as "cs",
>> (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id
>> <http://a.id>) as "ds",
>> (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id
>> <http://a.id>) as "es"
>> FROM a WHERE a.id <http://a.id> IN (<some big list,
>> ranging from 20-180 entries)
>> )
>> SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;
>>
>>
>> - You do have an index on `b.a_id` and `c.a_id`, etc... ? You
>> didn't say...
> Yes there are indices on all referenced columns of the subselect
> (they are all primary keys anyway)
>> - Are you sure it is the `to_jsonb` that is making this query slow?
> Yes, EXPLAIN ANALYZE shows a doubling of execution time - I don't
> have numbers on the memory usage difference though
>>
>> - Since you are serializing this for easy machine readable
>> consumption outside of the database, does it make a difference if
>> you use `to_json` instead?
>>
> Using to_json vs. to_jsonb makes no difference in regards to
> runtime, I will check if the memory consumption is different on
> monday - thank you for the idea!
>
>
> One other thought. Does it help if you convert the arrays to json
> first before you convert the whole row? ie, add some to_json()'s
> around the bs, cs, ds, es columns in the CTE. I'm wondering if
> breaking the json conversions up into smaller pieces will let the
> outer to_json() have less work to do and overall run faster. You
> could even separately serialize the elements inside the array too. I
> wouldn't think it would make a huge difference, you'd be making a
> bunch of extra to_json calls, but maybe it avoids some large memory
> structure that would otherwise have to be constructed to serialize all
> of those objects in all of the arrays all at the same time.
Using jsonb_array_agg and another to_jsonb at the (its still needed to
create one value at the end and to include the columns "a.*") worsens
the query performance by 100%, I can't speak for the memory usage
because I would have to push these changes to preproduction - will try
this on monday, thanks.
From | Date | Subject | |
---|---|---|---|
Next Message | Nico Heller | 2022-08-12 19:48:07 | Re: to_jsonb performance on array aggregated correlated subqueries |
Previous Message | Andres Freund | 2022-08-12 19:15:35 | Re: to_jsonb performance on array aggregated correlated subqueries |