BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: yankairong(at)ruc(dot)edu(dot)cn
Subject: BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI
Date: 2026-05-16 10:07:20
Message-ID: 19481-4afc34accc9043fe@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 19481
Logged by: muyehu
Email address: yankairong(at)ruc(dot)edu(dot)cn
PostgreSQL version: 18.4
Operating system: ubuntu22.04
Description:

I found what appears to be a planner selectivity issue involving
multivariate MCV statistics on expressions.

I can reproduce a case where expression MCV stats are applied for a plain
scan, but apparently not for an equivalent predicate when the same relation
appears on the nullable side of a LEFT JOIN.

Tested version:

PostgreSQL 18.4 on x86_64-pc-linux-gnu

Minimal self-contained repro:

DROP TABLE IF EXISTS mcv_bug;
CREATE TABLE mcv_bug (a int, b int, c int);
INSERT INTO mcv_bug
SELECT i, i, i
FROM generate_series(1,1000) AS g(i);

CREATE STATISTICS mcv_bug_stats (mcv)
ON (coalesce(mod(a,20),1)),
(coalesce(mod(b,10),1)),
(coalesce(mod(c,5),1))
FROM mcv_bug;

ANALYZE mcv_bug;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM mcv_bug
WHERE coalesce(mod(a,20),1) = 1
AND coalesce(mod(b,10),1) = 1
AND coalesce(mod(c,5),1) = 1;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM (VALUES (1)) AS d(x)
LEFT JOIN mcv_bug m ON true
WHERE coalesce(mod(m.a,20),1) = 1
AND coalesce(mod(m.b,10),1) = 1
AND coalesce(mod(m.c,5),1) = 1;

Observed output on my system:

Seq Scan on public.mcv_bug (cost=0.00..31.00 rows=50 width=12) (actual
time=0.013..0.074 rows=50 loops=1)
Output: a, b, c
Filter: ((COALESCE(mod(mcv_bug.a, 20), 1) = 1) AND
(COALESCE(mod(mcv_bug.b, 10), 1) = 1) AND (COALESCE(mod(mcv_bug.c, 5), 1) =
1))
Rows Removed by Filter: 950

Nested Loop Left Join (cost=0.00..41.01 rows=1 width=16) (actual
time=0.013..0.140 rows=50 loops=1)
Output: 1, m.a, m.b, m.c
Filter: ((COALESCE(mod(m.a, 20), 1) = 1) AND (COALESCE(mod(m.b, 10), 1) =
1) AND (COALESCE(mod(m.c, 5), 1) = 1))
Rows Removed by Filter: 950
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001
rows=1 loops=1)
-> Seq Scan on public.mcv_bug m (cost=0.00..16.00 rows=1000 width=12)
(actual time=0.010..0.065 rows=1000 loops=1)
Output: m.a, m.b, m.c

What seems wrong is that the plain query gets the expected estimate
(rows=50, matching the actual result), but the LEFT JOIN variant with the
same filter conditions falls back to a severe underestimate (rows=1, while
the actual result is 50).

Expected behavior:

I would expect the second query to receive a selectivity estimate comparable
to the first one, or at least to continue benefiting from the same
expression MCV statistics, since the predicate expressions are otherwise
equivalent.

Possible implementation hint:

This looks related to expression matching for extended statistics after
outer-join nullability decoration. In selfuncs.c, examine_variable() already
has logic and comments about stripping nullingrels before trying to match
expressions to extended statistics or expression indexes. However, the
extended-statistics path used for clause or expression matching appears to
still rely on direct expression equality in a way that no longer matches
once Vars on the nullable side of an outer join carry nullingrel markings.

This may be related to commit e28033fe1af8037e0fec8bb3a32fabbe18ac06b1
("Ignore nullingrels when looking up statistics"), which appears to fix a
closely related class of nullingrels-related statistics lookup issues.
However, the attached testcase still reproduces on PostgreSQL 18.4,
suggesting there may be a remaining gap specifically in multivariate
expression MCV clause matching on the nullable side of an outer join.

So this may be a residual nullingrels issue specific to multivariate
expression MCV matching.

If useful, I can test a patch or provide additional reduced cases.

## Reduced SQL Testcase

Compared to the original repro, this version removes the first `ANALYZE`
because the later `ANALYZE mcv_bug;` is sufficient to collect both regular
and extended statistics after `CREATE STATISTICS`.

```sql
DROP TABLE IF EXISTS mcv_bug;
CREATE TABLE mcv_bug (a int, b int, c int);

INSERT INTO mcv_bug
SELECT i, i, i
FROM generate_series(1,1000) AS g(i);

CREATE STATISTICS mcv_bug_stats (mcv)
ON (coalesce(mod(a,20),1)),
(coalesce(mod(b,10),1)),
(coalesce(mod(c,5),1))
FROM mcv_bug;

ANALYZE mcv_bug;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM mcv_bug
WHERE coalesce(mod(a,20),1) = 1
AND coalesce(mod(b,10),1) = 1
AND coalesce(mod(c,5),1) = 1;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM (VALUES (1)) AS d(x)
LEFT JOIN mcv_bug m ON true
WHERE coalesce(mod(m.a,20),1) = 1
AND coalesce(mod(m.b,10),1) = 1
AND coalesce(mod(m.c,5),1) = 1;
```

## Short Summary

The issue still reproduces on PostgreSQL 18.4:

- Plain scan estimate: `rows=50`, actual `rows=50`
- `LEFT JOIN` variant estimate: `rows=1`, actual `rows=50`

That strongly suggests the extended expression MCV stats are still not being
matched for equivalent predicates on the nullable side of the outer join.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2026-05-16 11:00:01 BUG #19482: Recursive QueueFKConstraintValidation() lacks stack depth check
Previous Message Henson Choi 2026-05-16 09:39:13 Re: BUG #19354: JOHAB rejects valid byte sequences