Making jsonb_agg() faster

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Making jsonb_agg() faster
Date: 2025-07-22 16:37:02
Message-ID: 1060917.1753202222@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There have been some complaints recently about how jsonb_agg()
is a lot slower than json_agg() [1]. That's annoying considering
that the whole selling point of JSONB is to have faster processing
than the original JSON type, so I poked into that. What I found
is that jsonb_agg() and its variants are just really inefficiently
implemented. Basically, for each aggregate input value, they will:

1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.

The motivation for this seems to have been to make sure that any
memory leakage during phase 1 does not happen in the long-lived
aggcontext. But it's hard not to call it a Rube Goldberg contraption.

The attached patch series gets rid of phases 2 and 3 by refactoring
pushJsonbValue() and related functions so that the JsonbValue tree
they construct can be constructed in a context that's not
CurrentMemoryContext. With that and some run-of-the-mill optimization
work, I'm getting 2.5X speedup for jsonb_agg on a text column (as
measured by the attached test script) and a bit over 2X on an int8
column. It's still a little slower than json_agg, but no longer
slower by integer multiples.

0001 is a somewhat invasive refactoring of the API for
pushJsonbValue and friends. It doesn't in itself have any
measurable speed consequences as far as I can tell, but I think
it makes the code nicer in any case. (I really do not like the
existing coding setup where sometimes it's important to capture
the result of pushJsonbValue and sometimes it's not; that
seems awfully confusing and bug-prone.) The real point though
is to have a way of commanding pushJsonbValue to build the
JsonbValue tree somewhere other than CurrentMemoryContext.

Having laid the groundwork with 0001, 0002 simply amounts to
telling pushJsonbValue to put its handiwork in the aggcontext
and then ripping out phases 2 and 3 of the aggregate transfns.

0003 is some simple micro-optimization of the datatype conversion
code in datum_to_jsonb_internal.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/CAHyXU0xQGXFBZ10GtqTkXL3_b8FbB79qP+XS2XCfxp+6WuH1Cg@mail.gmail.com

Attachment Content-Type Size
v1-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patch text/x-diff 72.4 KB
v1-0002-Remove-fundamentally-redundant-processing-in-json.patch text/x-diff 14.1 KB
v1-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patch text/x-diff 6.5 KB
jsonagg_speedtest.sql text/plain 377 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-07-22 17:13:00 Re: track generic and custom plans in pg_stat_statements
Previous Message Corey Huinker 2025-07-22 15:55:33 Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions