| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Avoiding memory leakage in jsonpath evaluation |
| Date: | 2026-03-18 07:57:53 |
| Message-ID: | D311D367-8313-4288-8272-CFAD67ED7CEA@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Mar 18, 2026, at 15:52, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
>> On Mar 18, 2026, at 05:33, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> I got an off-list report that a query like this consumes
>> an unreasonable amount of memory:
>>
>> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
>> '$[*] ? (@ < $)');
>>
>> For me, that eats about 6GB by the time it's done executing.
>> If that doesn't seem like a lot to you, just add another zero to the
>> generate_series call, and then it'll be more like 600GB, because the
>> leakage is O(N^2).
>>
>> Admittedly, this isn't an especially useful query: its runtime is
>> also O(N^2), because that path expression basically requires us to
>> compare every element of the input JSON array to every other element.
>> But it's not cool that it leaks so much memory while at it.
>>
>> I poked into this and found that the leakage is entirely composed of
>> "JsonValueList"s that are built during path evaluation and then just
>> left to rot until the end of jsonb_path_query(). We can fix it by
>> being careful to free those lists on the way out of each jsonpath
>> evaluation function that creates one. However, just doing that would
>> mean adding pfree overhead on top of palloc overhead, so I went a bit
>> further and reimplemented JsonValueList to be more compact and cheaper
>> to allocate/free. The attached seems to be a bit faster than the
>> existing code as well as not leaking so much memory. See the draft
>> commit message for more details.
>>
>> regards, tom lane
>
> This patch looks like a big win. It not only saves memory, but also makes the query much faster.
>
> I tested the query on my MacBook M4, increasing the iteration count from 10000 to 50000.
>
> Current master (3b4c2b9db25):
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 208581.771 ms (03:28.582)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 217269.595 ms (03:37.270)
> ```
>
> With the patch:
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18674.580 ms (00:18.675)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18889.329 ms (00:18.889)
> ```
>
> My observations were:
>
> * Before the patch, the backend process memory usage fluctuated between roughly 50GB and 145GB, while CPU usage stayed around 30%.
> * With the patch, the backend process memory usage stayed stable at around 30MB, while CPU usage stayed around 100%.
>
> After reviewing the patch, I thought JsonValueListLength() might be worth optimizing, since it is O(n). I tried adding an ntotal_items field to JsonValueList to track the total number of items, similar to the last pointer that is only meaningful in the base chunk. But that did not help in my test, and I realized JsonValueListLength() is not on the hottest path, so I dropped that idea.
>
> From the MacOS Instruments tool, the most expensive parts seem to be fillJsonbValue, JsonbIteratorNext, cmp_var_common, and cmp_numerics. But those look like separate topics.
>
> Overall, this looks like a solid patch.
>
Forgot to mention that, to run the tests, I turned off debug and assertion, and compiled with -O2.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Smith | 2026-03-18 08:05:12 | Re: Skipping schema changes in publication |
| Previous Message | Chao Li | 2026-03-18 07:52:40 | Re: Avoiding memory leakage in jsonpath evaluation |