Re: Hash-based MCV matching for large IN-lists

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, David Geier <geidav(dot)pg(at)gmail(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash-based MCV matching for large IN-lists
Date: 2026-03-02 09:35:15
Message-ID: 03aa576f-13bb-4ed1-bee4-0ee69ad8813c@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've addressed the previously mentioned issues in v7 patches.

I also retested the hash-based MCV path using bytea as the data type.

```
CREATE TABLE t (val bytea);
INSERT INTO t SELECT int4send(i) FROM generate_series(1, 10000) AS i,
generate_series(1, 50);

ALTER TABLE t ALTER COLUMN val SET STATISTICS 10000;
ANALYZE t;
SELECT string_agg(format('int4send(%s)', v), ',') FROM
generate_series(1, 10000) AS gs(v) \gset
EXPLAIN (SUMMARY) SELECT * FROM t WHERE val =
ANY(ARRAY[:string_agg]::bytea[]);
```

Planning Time Speedup

default_statistics_target | Before (ms) | After (ms) | Speedup (x)
--------------------------------------------------------------------
100                       | 0.984       | 0.697      | 1.41
500                       | 1.260       | 0.984      | 1.28
1000                      | 4.183       | 1.825      | 2.29
2500                      | 64.715      | 1.298      | 49.86
5000                      | 251.619     | 4.751      | 52.96
7500                      | 562.775     | 2.895      | 194.40
10000                     | 998.330     | 3.561      | 280.36

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

Attachment Content-Type Size
v7-0003-Use-hash-based-MCV-matching-for-ScalarArrayOpExpr.patch text/x-patch 17.5 KB
v7-0002-Use-O-1-selectivity-formula-for-eqsel-neqsel-IN-A.patch text/x-patch 6.4 KB
v7-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch text/x-patch 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2026-03-02 10:16:40 Re: pg_buffercache: Add per-relation summary stats
Previous Message Chao Li 2026-03-02 09:32:55 Question: rebuilding frontend tools after libpgfeutils.a changes?