Re: O(N^2) when building multi-column MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: O(N^2) when building multi-column MCV lists
Date: 2019-06-21 10:25:50
Message-ID: 20190621102550.x6ohcp6hwg5efkc7@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is a WIP/PoC fix addressing the O(N^2) behavior in ANALYZE with
high statistic target values. It needs more work, but it's good enough to
show some measurements.

For benchmark, I've created a simple 2-column table, with MCV list on
those two columns:

CREATE TABLE t (a int, b int);
CREATE STATISTICS s (mcv) ON a,b FROM t;

and then loaded data sets with different numbers of random combinations,
determined by number of values in each column. For example with 10 values
in a column, you get ~100 combinations.

INSERT INTO t
SELECT 10*random(), 10*random() FROM generate_series(1,3e6);

The 3M rows is picked because that's the sample size with target 10000.

The results with different statistic targets look like this:

1) master

values 100 1000 5000 10000
====================================================
10 103 586 2419 3041
100 116 789 4778 8934
1000 116 690 3162 499748

2) patched

values 100 1000 5000 10000
====================================================
10 113 606 2460 3716
100 143 711 3371 5231
1000 156 994 3836 6002

3) comparison (patched / master)

values 100 1000 5000 10000
====================================================
10 110% 103% 102% 122%
100 123% 90% 71% 59%
1000 134% 144% 121% 1%

So clearly, the issue for large statistic targets is resolved (duration
drops from 500s to just 6s), but there is measurable regression for the
other cases. That needs more investigation & fix before commit.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
mcv-base-freq-fix.patch text/plain 4.2 KB
test.sql application/sql 2.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-06-21 13:21:42 Re: ldapbindpasswdfile
Previous Message Peter Eisentraut 2019-06-21 09:12:38 allow_system_table_mods stuff