| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Making jsonb_agg() faster |
| Date: | 2025-11-05 09:14:40 |
| Message-ID: | 8A521EB7-2716-4E01-8505-98755B78D8EF@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Nov 4, 2025, at 02:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> writes:
>>> On Nov 3, 2025, at 04:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> PFA v3, rebased over 8a27d418f, no substantive changes whatever.
>
>> I am on vacation this week, I only find a hour in the evening and did an eyeball review without actually tracing and testing this patch set.
>
> Thanks for looking at it!
>
>> In elsewhere of the patch, you all use “JsonbInState ps = {0}” to do the initialization, only this place uses memset(). Can we keep consistent and use {0} here also. I see there is a “continue” and a “break” before the memset(), maybe you want to avoid unnecessary initialization. I guess that is a tiny saving, but if you think that saving is worthwhile, I’m fine with keeping the current code.
>
> I intentionally made the 0001 patch as mechanical as possible,
> and in particular did not second-guess existing decisions about
> whether to initialize a variable in the declaration or later.
> So I'm not excited about doing it differently in this one place.
> There are some other memsets in the same area that could also
> be replaced by "= {0}" if we cared to, but that seems like
> material for a separate cleanup patch.
>
>> Instead of hard-coding 12, can we use "sizeof(TimeTzADT)” for better readability?
>
> No, because that's not the same :-(. sizeof() would produce 16,
> on most machines anyway, thanks to alignment. But 12 is the
> type's size according to pg_type.
>
>> You left two “ugly hack” comments. Maybe add a short description for why they are hack and what can be improved in the future.
>
> It's the same ugly hack as before, I just formatted the code more
> legibly ... I didn't stop to look at whether there's a better way to
> do it. Again, that seems like material for a different patch.
>
> regards, tom lane
I got some time today, so I ran some performance tests, which show me a significant performance improvement from this patch.
To run the test, I disabled asserts and built with -O2. The tests ran on a MacBook M4.
To prepare for some data:
```
DROP TABLE IF EXISTS jtest;
CREATE TABLE jtest AS SELECT g AS id, (g % 1000) AS grp, g AS ival, (g::text || '_str') AS sval, now() + (g || ' seconds')::interval AS tsval FROM generate_series(1, 5_000_000) g;
VACUUM ANALYZE jtest;
```
Then I ran the following tests against both master and a patched branch:
1. Jsonb_agg
Patched:
```
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5810.921 ms (00:05.811)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5904.465 ms (00:05.904)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5857.314 ms (00:05.857)
```
Master:
```
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7310.648 ms (00:07.311)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7333.428 ms (00:07.333)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7257.666 ms (00:07.258)
```
Roughly 20% improvement.
2. Jsonb_object_agg
Patched:
```
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 767.258 ms
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 760.665 ms
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 772.326 ms
```
Master:
```
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1298.433 ms (00:01.298)
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1286.029 ms (00:01.286)
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1283.624 ms (00:01.284)
```
Roughly 40% improvement.
3. To_jsonb
Patched:
```
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2270.958 ms (00:02.271)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2313.483 ms (00:02.313)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2298.716 ms (00:02.299)
```
Master:
```
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2490.771 ms (00:02.491)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2514.326 ms (00:02.514)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2509.924 ms (00:02.510)
```
Roughly 8% improvement.
So, overall, this patch has done a great performance improvement.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2025-11-05 09:21:29 | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
| Previous Message | Daniel Gustafsson | 2025-11-05 08:56:13 | Re: Changing the state of data checksums in a running cluster |