From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Performance optimization for ANALYZE with extended statistics (dependencies) |
Date: | 2025-06-11 16:36:43 |
Message-ID: | fddacb94-b703-4059-80ae-8d9821299f58@tantorlabs.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
When calculating functional dependencies for extended statistics
(dependencies), the current implementation always sorts rows in sample
to measure the degree of dependency between columns.
However, if the column a8 in the dependency list has ndistinct = 1 (all
values are the same) or ndistinct = -1 (all values are unique), then the
functional dependency {a1, [... a7]} => a8 is always 1.0.
This patch adds a fast path to dependency_degree() to return 1.0
immediately in such cases, skipping the expensive sorting step, which
may involve up to many rows from sample.
There is a benchmark with the most extreme case with
'default_statistics_target = 10000' on a table with 3 million rows and 8
integer columns with unique values:
CREATE TABLE my_test_table_all_unique (
id SERIAL PRIMARY KEY,
col1 INTEGER,
col2 INTEGER,
col3 INTEGER,
col4 INTEGER,
col5 INTEGER,
col6 INTEGER,
col7 INTEGER,
col8 INTEGER
);
INSERT INTO my_test_table_all_unique (col1, col2, col3, col4, col5,
col6, col7, col8)
SELECT
n AS col1,
n + 10000000 AS col2,
n + 20000000 AS col3,
n + 30000000 AS col4,
n + 40000000 AS col5,
n + 50000000 AS col6,
n + 60000000 AS col7,
n + 70000000 AS col8
FROM generate_series(1, 3000000) as n;
CREATE STATISTICS (dependencies) ON col1, col2, col3, col4, col5, col6,
col7, col8 FROM my_test_table_all_unique;
Before patch:
ANALYZE my_test_table_all_unique;
Time: 142114,970 ms (02:22,115)
After patch:
ANALYZE my_test_table_all_unique;
Time: 1273,664 ms (00:01,274)
I am happy to hear any feedback or suggestions for improvement.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Optimize-functional-dependency-calculation-for-un.patch | text/x-patch | 1.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-06-11 16:39:40 | Re: strange perf regression with data checksums |
Previous Message | Sami Imseih | 2025-06-11 16:14:12 | Re: pg_get_multixact_members not documented |