Performance optimization for ANALYZE with extended statistics (dependencies)

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Performance optimization for ANALYZE with extended statistics (dependencies)
Date: 2025-06-11 16:36:43
Message-ID: fddacb94-b703-4059-80ae-8d9821299f58@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

When calculating functional dependencies for extended statistics
(dependencies), the current implementation always sorts rows in sample
to measure the degree of dependency between columns.

However, if the column a8 in the dependency list has ndistinct = 1 (all
values are the same) or ndistinct = -1 (all values are unique), then the
functional dependency {a1, [... a7]} => a8 is always 1.0.

This patch adds a fast path to dependency_degree() to return 1.0
immediately in such cases, skipping the expensive sorting step, which
may involve up to many rows from sample.

There is a benchmark with the most extreme case with
'default_statistics_target = 10000' on a table with 3 million rows and 8
integer columns with unique values:

CREATE TABLE my_test_table_all_unique (
    id SERIAL PRIMARY KEY,
    col1 INTEGER,
    col2 INTEGER,
    col3 INTEGER,
    col4 INTEGER,
    col5 INTEGER,
    col6 INTEGER,
    col7 INTEGER,
    col8 INTEGER
);
INSERT INTO my_test_table_all_unique (col1, col2, col3, col4, col5,
col6, col7, col8)
SELECT
    n AS col1,
    n + 10000000 AS col2,
    n + 20000000 AS col3,
    n + 30000000 AS col4,
    n + 40000000 AS col5,
    n + 50000000 AS col6,
    n + 60000000 AS col7,
    n + 70000000 AS col8
FROM generate_series(1, 3000000) as n;
CREATE STATISTICS (dependencies) ON col1, col2, col3, col4, col5, col6,
col7, col8 FROM my_test_table_all_unique;

Before patch:
ANALYZE my_test_table_all_unique;
Time: 142114,970 ms (02:22,115)

After patch:
ANALYZE my_test_table_all_unique;
Time: 1273,664 ms (00:01,274)

I am happy to hear any feedback or suggestions for improvement.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v1-0001-Optimize-functional-dependency-calculation-for-un.patch text/x-patch 1.8 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-06-11 16:39:40 Re: strange perf regression with data checksums
Previous Message Sami Imseih 2025-06-11 16:14:12 Re: pg_get_multixact_members not documented