Re: Making jsonb_agg() faster

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Making jsonb_agg() faster
Date: 2025-08-27 01:22:17
Message-ID: 2613D418-67E0-4DD8-BDA6-AB1BB04DB1A2@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> On Aug 23, 2025, at 03:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>>
>> v2-0001 takes care of that, and also adopts your suggestion in [1]
>> about not using two calls of pushJsonbValueScalar where one would do.
>> I also did a bit more micro-optimization in appendKey, appendValue,
>> appendElement to avoid redundant copying, because perf testing showed
>> that appendElement is still a hot-spot for jsonb_agg. Patches 0002
>> and 0003 are unchanged.
>>
>
>

A few more suggestions for pushJsonValue():

+ /* If an object or array is pushed, recursively push its contents */
+ if (jbval->type == jbvObject)
{
pushJsonbValue(pstate, WJB_BEGIN_OBJECT, NULL);
for (i = 0; i < jbval->val.object.nPairs; i++)
@@ -581,32 +607,29 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
pushJsonbValue(pstate, WJB_KEY, &jbval->val.object.pairs[i].key);
pushJsonbValue(pstate, WJB_VALUE, &jbval->val.object.pairs[i].value);
}
-
- return pushJsonbValue(pstate, WJB_END_OBJECT, NULL);
+ pushJsonbValue(pstate, WJB_END_OBJECT, NULL);
+ return;
}

To push WJB_BEGIN_OBJECT and WJB_END_OBJECT, we can directly call pushJsonValueScalar(), because once entering pushJsonbValue, they will meet the check of (seq != WJB_ELEM && seq != WJB_VALUE). Directly calling pushJsonValueScalar() will saves one level of recursion.

- if (jbval && (seq == WJB_ELEM || seq == WJB_VALUE) && jbval->type == jbvArray)
+ if (jbval->type == jbvArray)
{
pushJsonbValue(pstate, WJB_BEGIN_ARRAY, NULL);
for (i = 0; i < jbval->val.array.nElems; i++)
{
pushJsonbValue(pstate, WJB_ELEM, &jbval->val.array.elems[i]);
}
-
- return pushJsonbValue(pstate, WJB_END_ARRAY, NULL);
+ pushJsonbValue(pstate, WJB_END_ARRAY, NULL);
+ return;
}

Same thing for pushing WJB_BEGIN_ARRAY and WJB_END_ARRAY.

And for pushJsonbValueScalar():

- (*pstate)->size = 4;
+ ppstate->size = 4; /* initial guess at array size */

Can we do lazy allocation? Initially assume size = 0, only allocate memory when pushing the first element? This way, we won’t allocate memory for empty objects and arrays.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-08-27 01:23:13 Re: Why CI doesn't run?
Previous Message Chao Li 2025-08-27 01:05:15 Why CI doesn't run?