Re: Hash-based MCV matching for large IN-lists

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash-based MCV matching for large IN-lists
Date: 2026-03-10 14:55:57
Message-ID: 8375f8b5-1f21-401e-bbc3-a749302b76c7@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In the thread discussing ALL semantics and NULL [0], the question was
raised about adding a new regression test that checks selectivity
estimation. If the change gets committed, it would make sense to add
tests for this case as well.

Regarding the idea of optimizing the loop when all per-element
selectivities are the same: I ran some quick tests to see how much the
change in the v7-0002 patch affects planning time. Even without that
patch, iterating over an array with 50k elements takes about 30 ms.

```
CREATE TABLE t (val bytea PRIMARY KEY);
INSERT INTO t SELECT int4send(i) FROM generate_series(1,50000) AS i;
ANALYZE t;
SELECT n_distinct FROM pg_stats WHERE tablename = 't';
 n_distinct
------------
         -1
(1 row)

SELECT string_agg(format('int4send(%s)', i), ',') FROM
generate_series(1,50000) AS i \gset
EXPLAIN (SUMMARY) SELECT * FROM t WHERE val = ANY
(ARRAY[:string_agg]::bytea[]);
..........
 Planning Time: 32.816 ms
(3 rows)
```

Given that, I don't see much benefit in adding additional logic here
just to avoid the loop. It would likely introduce extra code complexity
without a manful gain. If there is interest in optimization this case
further, I can revisit it and add the additional patch.

The patch v8 can still be reviewed as-is, and if the selectivity
regression test gets committed [0], I will add corresponding tests for
this change as well.

[0]:
https://www.postgresql.org/message-id/390a46f3-dbc4-4dc1-b49d-5cc61dd36026%40tantorlabs.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment Content-Type Size
v8-0001-Use-hash-based-MCV-matching-for-ScalarArrayOpExpr.patch text/x-patch 18.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-03-10 15:06:44 Re: another autovacuum scheduling thread
Previous Message Robert Haas 2026-03-10 14:55:26 Re: pg_plan_advice