Re: Making jsonb_agg() faster

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Making jsonb_agg() faster
Date: 2025-08-14 08:59:56
Message-ID: CACJufxHuE-P0s1PxiRja26ZMjKUZiCAgWa1jedswo94SFzjXMg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 23, 2025 at 12:37 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> There have been some complaints recently about how jsonb_agg()
> is a lot slower than json_agg() [1]. That's annoying considering
> that the whole selling point of JSONB is to have faster processing
> than the original JSON type, so I poked into that. What I found
> is that jsonb_agg() and its variants are just really inefficiently
> implemented. Basically, for each aggregate input value, they will:
>
> 1. Build a JsonbValue tree representation of the input value.
> 2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
> 3. Iterate through the Jsonb, building a JsonbValue that is part
> of the aggregate's state stored in aggcontext, but is otherwise
> identical to what phase 1 built.
>
> The motivation for this seems to have been to make sure that any
> memory leakage during phase 1 does not happen in the long-lived
> aggcontext. But it's hard not to call it a Rube Goldberg contraption.
>
> The attached patch series gets rid of phases 2 and 3 by refactoring
> pushJsonbValue() and related functions so that the JsonbValue tree
> they construct can be constructed in a context that's not
> CurrentMemoryContext. With that and some run-of-the-mill optimization
> work, I'm getting 2.5X speedup for jsonb_agg on a text column (as
> measured by the attached test script) and a bit over 2X on an int8
> column. It's still a little slower than json_agg, but no longer
> slower by integer multiples.
>
> 0001 is a somewhat invasive refactoring of the API for
> pushJsonbValue and friends. It doesn't in itself have any
> measurable speed consequences as far as I can tell, but I think
> it makes the code nicer in any case. (I really do not like the
> existing coding setup where sometimes it's important to capture
> the result of pushJsonbValue and sometimes it's not; that
> seems awfully confusing and bug-prone.) The real point though
> is to have a way of commanding pushJsonbValue to build the
> JsonbValue tree somewhere other than CurrentMemoryContext.
>

in v1-0001
static void
copyScalarSubstructure(JsonbValue *v, MemoryContext outcontext)
{
}
outcontext will always be NULL in 0001.
that means copyScalarSubstructure is never really called.
it also means that JsonbInState->MemoryContext was always as NULL in 00001
maybe we don't need JsonbInState->MemoryContext in 0001
So 0001 refactoring change would be less invasive.

minor issue: copyScalarSubstructure (v->type) order aligned with (enum
jbvType) would be great.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2025-08-14 09:02:27 Re: ALTER DOMAIN ADD NOT NULL NOT VALID
Previous Message kasaharatt 2025-08-14 08:56:40 Re: Add log_autovacuum_{vacuum|analyze}_min_duration