Quick Links

Re: Hash-based MCV matching for large IN-lists

From:	Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
To:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc:	David Geier <geidav(dot)pg(at)gmail(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Hash-based MCV matching for large IN-lists
Date:	2026-03-02 21:37:01
Message-ID:	CAN4CZFO3Y25iCqqP_zS1ipgbrBXvAkxeLK2hPuamddyW9ouAzQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello!

+ if (vardata.isunique && vardata.rel && vardata.rel->tuples >= 1.0)
+ {
+ s2 = 1.0 / vardata.rel->tuples;
+ if (HeapTupleIsValid(vardata.statsTuple))
+ {
+ Form_pg_statistic stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+ if (isInequality)
+ s2 = 1.0 - s2 - stats->stanullfrac;
+ }
+ }

Isn't there's a corner case where this if order returns an incorrect
estimate/regression?
See the following test:

CREATE TABLE test AS SELECT generate_series(1, 1000) AS id;
CREATE UNIQUE INDEX ON test(id);
-- no ANALYZE

EXPLAIN SELECT * FROM test WHERE id <> ALL(ARRAY[1, 2, 3]);
-- Actual: rows=1
-- Expected: rows=997

ANALYZE test;
EXPLAIN SELECT * FROM test WHERE id <> ALL(ARRAY[1, 2, 3]);
-- Correct: rows=997

DROP TABLE test;

In response to

Re: Hash-based MCV matching for large IN-lists at 2026-03-02 09:35:15 from Ilia Evdokimov

Responses

Re: Hash-based MCV matching for large IN-lists at 2026-03-10 14:55:57 from Ilia Evdokimov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zsolt Parragi	2026-03-02 21:58:30	Re: Refactor handling of "-only" options in pg_dump, pg_restore
Previous Message	Jeff Davis	2026-03-02 21:34:23	Re: [19] CREATE SUBSCRIPTION ... SERVER