| From: | Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Extended statistics improvement: multi-column MCV missing values |
| Date: | 2026-06-01 22:18:20 |
| Message-ID: | CAN4CZFPySmmiepwG4+MH8nyDJb1xHK2HFcQfu4ZaEOO65jWKTg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello
> I am also not sure IN/ANY should be included in the first version.
I also only focused on testing the "basic" functionality.
> The strongest
> case is a top-level AND condition with equality clauses for all columns
I see one issue related to that with the current patch, having the
same amount of conditions as the table doesn't mean that we have
conditions for all columns.
See the following script that showcases a significant underestimate
when we use one column twice, and another 0 times in the conditions:
CREATE TABLE u (a int, b int, c int);
INSERT INTO u SELECT 0, (g % 5) + 1, g FROM generate_series(1, 50000) g;
INSERT INTO u SELECT (g % 30) + 10, (g % 30) + 10, (g % 30) + 10
FROM generate_series(1, 30000) g;
CREATE STATISTICS su (mcv) ON a, b, c FROM u;
ANALYZE u;
SELECT count(*) FROM u WHERE a = 0 AND b = ANY(ARRAY[1,2,3]); -- 30000
EXPLAIN SELECT * FROM u WHERE a = 0 AND b = ANY(ARRAY[1,2,3]);
-- ~18500, same as master
EXPLAIN SELECT * FROM u WHERE a = 0 AND a = ANY(ARRAY[0,9]) AND b =
ANY(ARRAY[1,2,3]); -- ~5200 vs ~11500 on master
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Andres Freund | 2026-06-01 21:57:15 | Re: Heads Up: cirrus-ci is shutting down June 1st |