Re: Hash-based MCV matching for large IN-lists

From: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash-based MCV matching for large IN-lists
Date: 2026-03-02 21:37:01
Message-ID: CAN4CZFO3Y25iCqqP_zS1ipgbrBXvAkxeLK2hPuamddyW9ouAzQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

+ if (vardata.isunique && vardata.rel && vardata.rel->tuples >= 1.0)
+ {
+ s2 = 1.0 / vardata.rel->tuples;
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ Form_pg_statistic stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+ if (isInequality)
+ s2 = 1.0 - s2 - stats->stanullfrac;
+ }
+ }

Isn't there's a corner case where this if order returns an incorrect
estimate/regression?
See the following test:

CREATE TABLE test AS SELECT generate_series(1, 1000) AS id;
CREATE UNIQUE INDEX ON test(id);
-- no ANALYZE

EXPLAIN SELECT * FROM test WHERE id <> ALL(ARRAY[1, 2, 3]);
-- Actual: rows=1
-- Expected: rows=997

ANALYZE test;
EXPLAIN SELECT * FROM test WHERE id <> ALL(ARRAY[1, 2, 3]);
-- Correct: rows=997

DROP TABLE test;

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zsolt Parragi 2026-03-02 21:58:30 Re: Refactor handling of "-only" options in pg_dump, pg_restore
Previous Message Jeff Davis 2026-03-02 21:34:23 Re: [19] CREATE SUBSCRIPTION ... SERVER