Re: Improve row estimation with multi-column unique indexes

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve row estimation with multi-column unique indexes
Date: 2026-07-01 15:21:18
Message-ID: CAO6_Xqpg9t6n0iibUwqihfEx3xPwWtRFGenxi=tKWoCNqyCcGA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the review!

On Sun, Jun 28, 2026 at 7:13 AM Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
> I think the v1 patch identifies the right fact, but applies it at the wrong level.

Yeah, I definitely wasn't super happy about modifying path's row after the fact.

> A full unique equality match is a relation-level one-row proof;

I don't think that's true for all cases. For example, this won't work
with a unique partial index. In your version, you're using
relation_has_unique_index_for, which explicitly bails out if the index
is a partial index.

I've tried another approach in v3, by introducing the notion of
"unique path": A unique index with at least one equality operation on
all key columns. When this condition is met, we can assume that this
path will yield at most one row, and adjust path->rows, numIndexTuples
and index selectivity accordingly.

This logic was partially implemented in btcostestimate, is now
extracted in is_path_unique and called when index paths are built.
There are some minor changes done:

- IS NULL is accepted as a valid equal operation if NULLS NOT DISTINCT is set
- Finding an array doesn't invalidate uniqueness. "a=1 AND
a=ANY('{1,2,3}')" is considered valid for a unique path
- Most of the logic to build the selectivityQuals and compute
selectivity can be skipped when the path is unique.

This new unique_path property is stored in IndexPath and used in:
- cost_index to set path's row to 1
- genericcostestimate to adjust selectivity to 1.0/reltuples
- btcostestimate, replacing the previous conditions to set numIndexTuples = 1.0

There's also the option of passing unique_path as a parameter, but
that would require modifying the amcostestimate interface to add the
parameter.

> I also made the B-tree unique lookup shortcut require immediate uniqueness for the same reason

For the deferred constraint invalidating the uniqueness, the current
code doesn't check it, so I wonder if it was deliberate? Outside of
extreme cases like a transaction creating a lot of duplicated rows, an
estimation of 1 row is likely going to be closer to the truth than an
estimation done using statistics.

I've reused some of the tests you've written, and created a dedicated
explain_one_or_more_row function. It displays either 1 or >1 in the
row estimation, which can be used to check cases where we're sure the
row estimation should be 1.

I've split the patch in 3:
0001: Contains the new tests with the outputs when running on master.
0002: Adds unique_path to IndexPath and use it to update path rows,
numIndexTuples and index selectivity
0003: Moves building of indexBoundQuals to the else branch of
path->unique_path. The additional indentation + pgindent generated a
lot of line changes.

Regards,
Anthonin Bonnefoy

Attachment Content-Type Size
v3-0002-Adjust-row-estimation-and-index-selectivity-with-.patch application/octet-stream 17.2 KB
v3-0003-Avoid-building-indexBoundQuals-if-path-is-unique.patch application/octet-stream 22.4 KB
v3-0001-Added-planner-estimation-test-for-multi-column-ro.patch application/octet-stream 14.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-07-01 15:22:07 Re: Coverage (lcov) failing with inconsistent error in versions 2.x
Previous Message Kirk Wolak 2026-07-01 15:17:57 Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions