Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY

From: cca5507 <cca5507(at)qq(dot)com>
To: surya poondla <suryapoondla4(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Giuliano Gagliardi <gogi(at)gogi(dot)tv>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
Date: 2026-03-21 06:19:56
Message-ID: tencent_2DDB9D84961482D40AA55CC88CE0E51D740A@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Surya,

> For issue 1, my fix removes the IS NOT NULL guard from the pre-check so
> that *= can detect all duplicate rows, including those containing NULLs.
> (Note: The semantics of *=  has always treated NULL as equal to NULL.)
>
> The reasoning is straightforward: the JOIN uses *= to match newdata rows
> against MV rows. If newdata contains two *=-equal rows, both would match
> the same MV row in the JOIN, producing a wrong diff. The pre-check must
> therefore use the same *= semantics to catch exactly those duplicates 
> which is what my fix does by removing the IS NOT NULL guard.
> The IS NOT NULL guard was the bug as it was hiding real duplicates from detection.

If I understand correctly, your fix will break the following case which works well currently:

CREATE TABLE t (a int, b int);
INSERT INTO t VALUES (null, null);
CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
CREATE UNIQUE INDEX ON m(a);
INSERT INTO t VALUES (null, null);
REFRESH MATERIALIZED VIEW CONCURRENTLY m;

Your fix will report an error because of the two (null, null) rows. On master, this case
works well because of the join condition "mv.a = newdada.a" which considers two
NULLs not equal, so we will get a correct diff table.

> Your approach leaves the pre-check unchanged and instead replaces *= in
> the JOIN with record_image_eq_variant (NULL != NULL). I see two concerns:
> 1. record_image_eq_variant applies NULL != NULL globally to all rows in
>    the JOIN, not just duplicate ones. This means any unchanged row
>    containing any NULL in any column will never match its counterpart
>    during the JOIN, causing a DELETE + INSERT for that row on every
>    refresh even when the data has not changed. The original issue 2 was
>    specifically about nullable indexed columns, your fix extends the
>    performance problem to all nullable columns anywhere in the row,
>    which makes the performance worse than issue 2.

Yeah, you're right. Any row containing any NULL in any column will get into the
diff table. But it's for correctness. Maybe user should avoid using CONCURRENTLY
with a lot of rows containing NULL.

> 2. The error surfaced becomes a unique_violation from the index rather
>    than the explicit "contains duplicate rows" message, which is harder
>    for users to diagnose.

I don't think it's a big issue.

--
Regards,
ChangAo Chen

In response to

Browse pgsql-bugs by date

  From Date Subject
Previous Message Tom Lane 2026-03-20 20:26:13 Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables