From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Making jsonb_agg() faster |
Date: | 2025-07-22 16:37:02 |
Message-ID: | 1060917.1753202222@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
There have been some complaints recently about how jsonb_agg()
is a lot slower than json_agg() [1]. That's annoying considering
that the whole selling point of JSONB is to have faster processing
than the original JSON type, so I poked into that. What I found
is that jsonb_agg() and its variants are just really inefficiently
implemented. Basically, for each aggregate input value, they will:
1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.
The motivation for this seems to have been to make sure that any
memory leakage during phase 1 does not happen in the long-lived
aggcontext. But it's hard not to call it a Rube Goldberg contraption.
The attached patch series gets rid of phases 2 and 3 by refactoring
pushJsonbValue() and related functions so that the JsonbValue tree
they construct can be constructed in a context that's not
CurrentMemoryContext. With that and some run-of-the-mill optimization
work, I'm getting 2.5X speedup for jsonb_agg on a text column (as
measured by the attached test script) and a bit over 2X on an int8
column. It's still a little slower than json_agg, but no longer
slower by integer multiples.
0001 is a somewhat invasive refactoring of the API for
pushJsonbValue and friends. It doesn't in itself have any
measurable speed consequences as far as I can tell, but I think
it makes the code nicer in any case. (I really do not like the
existing coding setup where sometimes it's important to capture
the result of pushJsonbValue and sometimes it's not; that
seems awfully confusing and bug-prone.) The real point though
is to have a way of commanding pushJsonbValue to build the
JsonbValue tree somewhere other than CurrentMemoryContext.
Having laid the groundwork with 0001, 0002 simply amounts to
telling pushJsonbValue to put its handiwork in the aggcontext
and then ripping out phases 2 and 3 of the aggregate transfns.
0003 is some simple micro-optimization of the datatype conversion
code in datum_to_jsonb_internal.
Thoughts?
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patch | text/x-diff | 72.4 KB |
v1-0002-Remove-fundamentally-redundant-processing-in-json.patch | text/x-diff | 14.1 KB |
v1-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patch | text/x-diff | 6.5 KB |
jsonagg_speedtest.sql | text/plain | 377 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-07-22 17:13:00 | Re: track generic and custom plans in pg_stat_statements |
Previous Message | Corey Huinker | 2025-07-22 15:55:33 | Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions |