Re: Making jsonb_agg() faster

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Making jsonb_agg() faster
Date: 2025-11-05 09:14:40
Message-ID: 8A521EB7-2716-4E01-8505-98755B78D8EF@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 4, 2025, at 02:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> writes:
>>> On Nov 3, 2025, at 04:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> PFA v3, rebased over 8a27d418f, no substantive changes whatever.
>
>> I am on vacation this week, I only find a hour in the evening and did an eyeball review without actually tracing and testing this patch set.
>
> Thanks for looking at it!
>
>> In elsewhere of the patch, you all use “JsonbInState ps = {0}” to do the initialization, only this place uses memset(). Can we keep consistent and use {0} here also. I see there is a “continue” and a “break” before the memset(), maybe you want to avoid unnecessary initialization. I guess that is a tiny saving, but if you think that saving is worthwhile, I’m fine with keeping the current code.
>
> I intentionally made the 0001 patch as mechanical as possible,
> and in particular did not second-guess existing decisions about
> whether to initialize a variable in the declaration or later.
> So I'm not excited about doing it differently in this one place.
> There are some other memsets in the same area that could also
> be replaced by "= {0}" if we cared to, but that seems like
> material for a separate cleanup patch.
>
>> Instead of hard-coding 12, can we use "sizeof(TimeTzADT)” for better readability?
>
> No, because that's not the same :-(. sizeof() would produce 16,
> on most machines anyway, thanks to alignment. But 12 is the
> type's size according to pg_type.
>
>> You left two “ugly hack” comments. Maybe add a short description for why they are hack and what can be improved in the future.
>
> It's the same ugly hack as before, I just formatted the code more
> legibly ... I didn't stop to look at whether there's a better way to
> do it. Again, that seems like material for a different patch.
>
> regards, tom lane

I got some time today, so I ran some performance tests, which show me a significant performance improvement from this patch.

To run the test, I disabled asserts and built with -O2. The tests ran on a MacBook M4.

To prepare for some data:
```
DROP TABLE IF EXISTS jtest;
CREATE TABLE jtest AS SELECT g AS id, (g % 1000) AS grp, g AS ival, (g::text || '_str') AS sval, now() + (g || ' seconds')::interval AS tsval FROM generate_series(1, 5_000_000) g;
VACUUM ANALYZE jtest;
```

Then I ran the following tests against both master and a patched branch:

1. Jsonb_agg

Patched:
```
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5810.921 ms (00:05.811)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5904.465 ms (00:05.904)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5857.314 ms (00:05.857)
```

Master:
```
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7310.648 ms (00:07.311)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7333.428 ms (00:07.333)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7257.666 ms (00:07.258)
```

Roughly 20% improvement.

2. Jsonb_object_agg

Patched:
```
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 767.258 ms
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 760.665 ms
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 772.326 ms
```

Master:
```
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1298.433 ms (00:01.298)
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1286.029 ms (00:01.286)
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1283.624 ms (00:01.284)
```

Roughly 40% improvement.

3. To_jsonb

Patched:
```
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2270.958 ms (00:02.271)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2313.483 ms (00:02.313)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2298.716 ms (00:02.299)
```

Master:
```
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2490.771 ms (00:02.491)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2514.326 ms (00:02.514)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2509.924 ms (00:02.510)
```

Roughly 8% improvement.

So, overall, this patch has done a great performance improvement.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2025-11-05 09:21:29 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Previous Message Daniel Gustafsson 2025-11-05 08:56:13 Re: Changing the state of data checksums in a running cluster