| From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Cc: | yankairong(at)ruc(dot)edu(dot)cn |
| Subject: | BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI |
| Date: | 2026-05-16 10:07:20 |
| Message-ID: | 19481-4afc34accc9043fe@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 19481
Logged by: muyehu
Email address: yankairong(at)ruc(dot)edu(dot)cn
PostgreSQL version: 18.4
Operating system: ubuntu22.04
Description:
I found what appears to be a planner selectivity issue involving
multivariate MCV statistics on expressions.
I can reproduce a case where expression MCV stats are applied for a plain
scan, but apparently not for an equivalent predicate when the same relation
appears on the nullable side of a LEFT JOIN.
Tested version:
PostgreSQL 18.4 on x86_64-pc-linux-gnu
Minimal self-contained repro:
DROP TABLE IF EXISTS mcv_bug;
CREATE TABLE mcv_bug (a int, b int, c int);
INSERT INTO mcv_bug
SELECT i, i, i
FROM generate_series(1,1000) AS g(i);
CREATE STATISTICS mcv_bug_stats (mcv)
ON (coalesce(mod(a,20),1)),
(coalesce(mod(b,10),1)),
(coalesce(mod(c,5),1))
FROM mcv_bug;
ANALYZE mcv_bug;
EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM mcv_bug
WHERE coalesce(mod(a,20),1) = 1
AND coalesce(mod(b,10),1) = 1
AND coalesce(mod(c,5),1) = 1;
EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM (VALUES (1)) AS d(x)
LEFT JOIN mcv_bug m ON true
WHERE coalesce(mod(m.a,20),1) = 1
AND coalesce(mod(m.b,10),1) = 1
AND coalesce(mod(m.c,5),1) = 1;
Observed output on my system:
Seq Scan on public.mcv_bug (cost=0.00..31.00 rows=50 width=12) (actual
time=0.013..0.074 rows=50 loops=1)
Output: a, b, c
Filter: ((COALESCE(mod(mcv_bug.a, 20), 1) = 1) AND
(COALESCE(mod(mcv_bug.b, 10), 1) = 1) AND (COALESCE(mod(mcv_bug.c, 5), 1) =
1))
Rows Removed by Filter: 950
Nested Loop Left Join (cost=0.00..41.01 rows=1 width=16) (actual
time=0.013..0.140 rows=50 loops=1)
Output: 1, m.a, m.b, m.c
Filter: ((COALESCE(mod(m.a, 20), 1) = 1) AND (COALESCE(mod(m.b, 10), 1) =
1) AND (COALESCE(mod(m.c, 5), 1) = 1))
Rows Removed by Filter: 950
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001
rows=1 loops=1)
-> Seq Scan on public.mcv_bug m (cost=0.00..16.00 rows=1000 width=12)
(actual time=0.010..0.065 rows=1000 loops=1)
Output: m.a, m.b, m.c
What seems wrong is that the plain query gets the expected estimate
(rows=50, matching the actual result), but the LEFT JOIN variant with the
same filter conditions falls back to a severe underestimate (rows=1, while
the actual result is 50).
Expected behavior:
I would expect the second query to receive a selectivity estimate comparable
to the first one, or at least to continue benefiting from the same
expression MCV statistics, since the predicate expressions are otherwise
equivalent.
Possible implementation hint:
This looks related to expression matching for extended statistics after
outer-join nullability decoration. In selfuncs.c, examine_variable() already
has logic and comments about stripping nullingrels before trying to match
expressions to extended statistics or expression indexes. However, the
extended-statistics path used for clause or expression matching appears to
still rely on direct expression equality in a way that no longer matches
once Vars on the nullable side of an outer join carry nullingrel markings.
This may be related to commit e28033fe1af8037e0fec8bb3a32fabbe18ac06b1
("Ignore nullingrels when looking up statistics"), which appears to fix a
closely related class of nullingrels-related statistics lookup issues.
However, the attached testcase still reproduces on PostgreSQL 18.4,
suggesting there may be a remaining gap specifically in multivariate
expression MCV clause matching on the nullable side of an outer join.
So this may be a residual nullingrels issue specific to multivariate
expression MCV matching.
If useful, I can test a patch or provide additional reduced cases.
## Reduced SQL Testcase
Compared to the original repro, this version removes the first `ANALYZE`
because the later `ANALYZE mcv_bug;` is sufficient to collect both regular
and extended statistics after `CREATE STATISTICS`.
```sql
DROP TABLE IF EXISTS mcv_bug;
CREATE TABLE mcv_bug (a int, b int, c int);
INSERT INTO mcv_bug
SELECT i, i, i
FROM generate_series(1,1000) AS g(i);
CREATE STATISTICS mcv_bug_stats (mcv)
ON (coalesce(mod(a,20),1)),
(coalesce(mod(b,10),1)),
(coalesce(mod(c,5),1))
FROM mcv_bug;
ANALYZE mcv_bug;
EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM mcv_bug
WHERE coalesce(mod(a,20),1) = 1
AND coalesce(mod(b,10),1) = 1
AND coalesce(mod(c,5),1) = 1;
EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM (VALUES (1)) AS d(x)
LEFT JOIN mcv_bug m ON true
WHERE coalesce(mod(m.a,20),1) = 1
AND coalesce(mod(m.b,10),1) = 1
AND coalesce(mod(m.c,5),1) = 1;
```
## Short Summary
The issue still reproduces on PostgreSQL 18.4:
- Plain scan estimate: `rows=50`, actual `rows=50`
- `LEFT JOIN` variant estimate: `rows=1`, actual `rows=50`
That strongly suggests the extended expression MCV stats are still not being
matched for equivalent predicates on the nullable side of the outer join.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | PG Bug reporting form | 2026-05-16 11:00:01 | BUG #19482: Recursive QueueFKConstraintValidation() lacks stack depth check |
| Previous Message | Henson Choi | 2026-05-16 09:39:13 | Re: BUG #19354: JOHAB rejects valid byte sequences |