Re: [RFC] Improving multi-column filter cardinality estimation using MCVs and HyperLogLog

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Improving multi-column filter cardinality estimation using MCVs and HyperLogLog
Date: 2022-05-24 22:16:43
Message-ID: Yo1ZS3ut2jDzmD/y@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 16, 2022 at 12:09:41AM +0200, Tomas Vondra wrote:
> I think it's an interesting idea. In principle it allows deducing the
> multi-column MCV for arbitrary combination of columns, not determined in
> advance. We'd have the MCV with HLL instead of frequencies for columns
> A, B and C:
>
> (a1, hll(a1))
> (a2, hll(a2))
> (...)
> (aK, hll(aK))
>
>
> (b1, hll(b1))
> (b2, hll(b2))
> (...)
> (bL, hll(bL))
>
> (c1, hll(c1))
> (c2, hll(c2))
> (...)
> (cM, hll(cM))
>
> and from this we'd be able to build MCV for any combination of those
> three columns.

Sorry, but I am lost here. I read about HLL here:

https://towardsdatascience.com/hyperloglog-a-simple-but-powerful-algorithm-for-data-scientists-aed50fe47869

However, I don't see how they can be combined for multiple columns.
Above, I know A,B,C are columns, but what is a1, a2, etc?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-05-24 22:16:44 Re: postgres_fdw has insufficient support for large object
Previous Message Zhihong Yu 2022-05-24 22:02:13 Re: adding status for COPY progress report