O(N^2) when building multi-column MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: O(N^2) when building multi-column MCV lists
Date: 2019-06-18 21:43:13
Message-ID: 20190618214313.xrrufnkewq3x63lv@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When experimenting with multi-column MCV lists with statistic target set
to high value (e.g. 10k), I've realized there's an O(N^2) issue in
statext_mcv_build() when computing base frequencies.

We do this:

for (i = 0; i < nitems; i++)
{
...
item->base_frequency = 1.0;
for (j = 0; j < numattrs; j++)
{
int count = 0;
int k;

for (k = 0; k < ngroups; k++)
{
if (multi_sort_compare_dim(j, &groups[i], &groups[k], mss) == 0)
count += groups[k].count;
}

...
}
}

That is, for each item on the MCV list, we walk through all the groups
(for each dimension independently) to determine the total frequency of
the value.

With many groups (which can easily happen for high statistics target),
this can easily get very expensive.

I think the best solution here is to pre-compute frequencies for values
in all dimensions, and then just access that instead of looping through
the groups over and over.

IMHO this is something we should fix for PG12, so I'll put that on the
open items list, and produce a fix shortly.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-06-18 22:16:40 PL/Python fails on new NetBSD/PPC 8.0 install
Previous Message Tomas Vondra 2019-06-18 21:33:57 Multivariate MCV list vs. statistics target