Quick Links

array_agg(anyarray) silently produces corrupt results with parallel workers when inputs mix NULL and non-NULL array elements

From:	Dmytro Astapov <dastapov(at)gmail(dot)com>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	array_agg(anyarray) silently produces corrupt results with parallel workers when inputs mix NULL and non-NULL array elements
Date:	2026-03-27 12:29:12
Message-ID:	CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

PostgreSQL version: 17.2 (also verified on 17.9 and 18.3)
Operating system: Linux x86_64 (Red Hat 8)

Short description
-----------------

array_agg(ARRAY[...]) produces silently corrupted 2-D arrays when the query
uses parallel partial aggregation and the input arrays contain a mix of
NULL and non-NULL elements. NULL values appear at the wrong positions in
the output, and non-NULL values disappear or shift.

The corruption is non-deterministic: two identical queries against the same
data can return different results.

Disabling parallelism with SET max_parallel_workers_per_gather = 0
eliminates the problem.

How to reproduce
----------------

The bug requires the planner to choose a Partial GroupAggregate (or Partial
HashAggregate) plan with parallel workers. This typically needs a table
large enough for the planner to prefer parallel aggregation (I used ~10M
rows in my tests).

-- Create and populate a test table with ~10M rows.
-- Three NULL patterns in m1/m2/m3 columns:
-- ~68% rows: m1=val, m2=val, m3='' (no NULLs)
-- ~19% rows: m1=NULL, m2=NULL, m3=NULL (all NULL)
-- ~13% rows: m1=val, m2='', m3=val (no NULLs)

CREATE TABLE test_data (
group_id int NOT NULL,
branch_name text NOT NULL,
branch_type text NOT NULL,
m1 text,
m2 text,
m3 text
);

INSERT INTO test_data
SELECT
(i / 4),
'BRANCH_' || i,
'Type' || (i % 3),
CASE WHEN i%100<68 THEN 'val_'||i WHEN i%100<87 THEN NULL ELSE
'val_'||i END,
CASE WHEN i%100<68 THEN 'v2_'||i WHEN i%100<87 THEN NULL ELSE '' END,
CASE WHEN i%100<68 THEN '' WHEN i%100<87 THEN NULL ELSE 'v3_'||i
END
FROM generate_series(1, 2000000) AS g(i);

-- Replicate to ~10M rows so the planner picks a parallel plan.
INSERT INTO test_data
SELECT group_id + 500001, branch_name || '_r' || r, branch_type, m1, m2, m3
FROM test_data, generate_series(1, 4) r;

ANALYZE test_data;

-- Verify the plan uses Partial GroupAggregate with parallel workers:
SET max_parallel_workers_per_gather = 4;
EXPLAIN (COSTS OFF)
SELECT group_id % 100000 AS gid, array_agg(ARRAY[m1, m2, m3]) AS m1m2m3s
FROM test_data GROUP BY group_id % 100000;

-- Expected plan:
-- Finalize GroupAggregate
-- -> Gather Merge
-- Workers Planned: 4
-- -> Partial GroupAggregate
-- -> Sort
-- -> Parallel Seq Scan on test_data

-- Create ground truth (no parallelism):
SET max_parallel_workers_per_gather = 0;
CREATE TABLE gt AS
SELECT group_id % 100000 AS gid,
array_agg(ARRAY[m1, m2, m3]) AS m1m2m3s
FROM test_data GROUP BY group_id % 100000;

-- Create parallel result:
SET max_parallel_workers_per_gather = 4;
CREATE TABLE pr AS
SELECT group_id % 100000 AS gid,
array_agg(ARRAY[m1, m2, m3]) AS m1m2m3s
FROM test_data GROUP BY group_id % 100000;

-- Order-independent comparison (eliminates row-ordering differences,
-- detects only value corruption):
SELECT count(*) AS corrupted_groups
FROM (
SELECT gid, array_agg(COALESCE(v, '!!NULL!!') ORDER BY v) AS sv
FROM gt, unnest(m1m2m3s) v GROUP BY gid
) a
JOIN (
SELECT gid, array_agg(COALESCE(v, '!!NULL!!') ORDER BY v) AS sv
FROM pr, unnest(m1m2m3s) v GROUP BY gid
) b ON a.gid = b.gid
WHERE a.sv != b.sv;

On every run I have tested, corrupted_groups is in the range of 4000-6000
out of 100000 total groups (~5%). Results differ between runs, confirming
non-determinism.

I verified this on three versions: REL_17_2, REL_17_9, REL_18_3. All three
versions were built with: ./configure --enable-debug --enable-cassert

Root cause
----------

The bug is seemingly in array_agg_array_combine() in
src/backend/utils/adt/array_userfuncs.c.

The combine function is used during parallel aggregation of
array_agg(anyarray).
It was introduced in commit 16fd03e9565 ("Allow parallel aggregate on
string_agg and array_agg", 2023-01-23), first shipped in PG 16.

When two partial aggregation states are combined, array_agg_array_combine
must merge their null bitmaps. The current code only enters the
bitmap-handling block when state2 (the incoming partial state) has a
nullbitmap:

if (state2->nullbitmap)
{
...
}

I think this misses the case where state1 (the running state) already has a
nullbitmap but state2 does not. In that scenario, state2's data bytes are
appended to state1's data buffer and state1->nitems is incremented, but the
nullbitmap is NOT extended to cover state2's items. The bit positions for
state2's items are left as uninitialized memory, which randomly marks some
elements as NULL. This shifts the interpretation of the data buffer and
corrupts the output.

For comparison, the non-parallel accumArrayResultArr() in arrayfuncs.c has
this condition:

if (astate->nullbitmap || ARR_HASNULL(arg))

which enters the bitmap block whenever EITHER the existing state has a
bitmap OR the new input has NULLs.

This bug triggers when parallel workers split a group's rows such that one
worker sees only NULL-containing arrays (building a state with a
nullbitmap) and another sees only non-NULL arrays (no nullbitmap), and the
combine function is called with the NULL-containing state as state1 and the
non-NULL state as state2. Since row distribution accross workers is
non-deterministic, the corruption is too.

Fixing this condition would require a second, related change in the same
function. When extending the bitmap, the code computes:

int newaitems = state1->aitems + state2->aitems;

With the corrected condition (state1->nullbitmap || state2->nullbitmap),
state2->aitems can now be 0 (no bitmap was allocated for state2). This
makes newaitems equal to state1->aitems, which may be less than newnitems
(state1->nitems + state2->nitems).

The subsequent pg_nextpower2_32(newaitems) then allocates a bitmap which
will be too small, and array_bitmap_copy writes past the end of it.

This could be verified with --enable-cassert, running the reproduction
steps will produce "problem in alloc set ExprContext: req size > alloc
size" warnings.

Fix
---

I think that the following two changes to array_agg_array_combine() in
array_userfuncs.c fix the issue:

1. Change the condition guarding the null bitmap block from
"if (state2->nullbitmap)" to
"if (state1->nullbitmap || state2->nullbitmap)".

2. Change the bitmap reallocation size from
"state1->aitems + state2->aitems" to
"Max(state1->aitems + state2->aitems, newnitems)"
to ensure the bitmap is always large enough.

Patch (applies cleanly to REL_16_STABLE through REL_18_STABLE and
master), I am also including the same as an attachment:

--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -997,7 +997,7 @@
state1->data = (char *) repalloc(state1->data, state1->abytes);
}

- if (state2->nullbitmap)
+ if (state1->nullbitmap || state2->nullbitmap)
{
int newnitems = state1->nitems + state2->nitems;

@@ -1015,7 +1015,8 @@
}
else if (newnitems > state1->aitems)
{
- int newaitems = state1->aitems + state2->aitems;
+ int newaitems = Max(state1->aitems + state2->aitems,
+ newnitems);

state1->aitems = pg_nextpower2_32(newaitems);
state1->nullbitmap = (bits8 *)

I have verified that this patch eliminates the corruption on all three
versions tested (17.2, 17.9, 18.3): the corrupted_groups count drops
from ~5000 to 0 in every run.

Best regards, Dmytro

Attachment	Content-Type	Size
fix_array_agg_parallel_nullbitmap.patch	text/x-patch	631 bytes

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Lukas Fittl	2026-03-27 15:25:47	Re: Memory leak in pg_stat_statements when qtext file contains invalid encoding
Previous Message	Heikki Linnakangas	2026-03-27 11:13:09	Re: Memory leak in pg_stat_statements when qtext file contains invalid encoding